I have a very large column-delimited file coming out of a database report in something like this:
field1,field2,field3,metricA,value1
field1,field2,field3,metricB,value2
I want the new file to have combine lines like this so it would look something like this:
field1,field2,field3,value1,value2
I'm able to do this using a hash. In this example, the first three fields are the key and I combine value1 and value in a certain order to be the value. After I've read in the file, I just print out the hash table's keys and values into another file. Works fine.
However, I have some concerns since my file is going to be very large. About 8 GB per file.
Would there be a more efficient way of doing this? I'm not thinking in terms of speed, but in terms of memory footprint. I'm concerned that this process could die due to memory issues. I'm just drawing a blank in terms of a solution that would work but wouldn't shove everything into, ultimately, a very large hash.
For full-disclosure, I'm using ActiveState Perl on Windows.