Hi,
Disclaimer: I'm not a programmer, never was, never learned algorithms, CS, etc. Just have to work with it.
My question is: I need to split a huge (over 4 GB) CSV file into smaller ones (then process it with require 'win32ole'
) based on the first field. In awk it's rather easy:
awk -F ',' '{myfile=$1 ; print $0 >> (myfile".csv")}' KNAGYFILE.csv
But with ruby
I did:
open('hugefile').each { |hline|
accno = hline[0,12]
nline = hline[13,10000].gsub(/;/,",")
accfile = File.open("#{accno.to_s}.csv", "a")
accfile.puts nline
accfile.close
}
Then recognized that it's resource inefficient (several file open/close). I'm sure there's a better way to do it, could You explain me how?
UPDATE: just forgot to mention, that the file is sorted on the first column. E.g. if this is hugefile:
012345678901,1,1,1,1,1,1
012345678901,1,2,1,1,1,1
012345678901,1,1,A,1,1,1
012345678901,1,1,1,1,A,A
A12345678901,1,1,1,1,1,1
A12345678901,1,1,1,1,1,1
A12345678901,1,1,1,1,1,1
A12345678901,1,1,1,1,1,1
Then I need two new files, named 012345678901.csv
and A12345678901.csv
.