Greetings,
I've taken over from a prior team and writing ETL jobs which process csv files. I use a combination of shell scripts and perl on ubuntu. The csv files are huge; they arrive as zipped archives. Unzipped, many are more than 30Gb - yes, that's a G
Legacy process is a batch job running on cron that unzips each file entirely, reads and copies the first line of it into a config file, then re-zips the entire file. Some days this takes many many hours of processing time, for no benefit.
Can you suggest a method to only extract the first line (or first few lines) from each file inside a zipped archive, without fully unpacking the archives?