Hi all,
We have a process which takes a very large csv (1.6GB) and breaks it down into pieces (in this case 3). This runs nightly and normally doesn't give us any problems. When it ran last night, however, the first of the output files had lost precision on the numeric fields in the data. The active ingredient in the script are the lines:
while lineCounter <= chunk:
oOutFile.write(oInFile.readline())
lineCounter = lineCounter + 1
and the normal output might be something like
StringField1; StringField2; StringField3; StringField4; 1000000; StringField5; 0.000054454
etc.
On this one occasion and in this one output file the numeric fields were all output with 6 zeros at the end i.e.
StringField1; StringField2; StringField3; StringField4; 1000000.000000; StringField5; 0.000000
We are using Python v2.6 (and don't want to upgrade unless we really have to) but we can't afford to lose this data. Does anyone have any idea why this might have happened? If the readline is doing some kind of implicit conversion is there a way to do a binary read, because we really just want this data to pass through untouched?
It is very wierd to us that this only affected one of the output files generated by the same script, and when it was rerun the output was as expected.
thanks
Jack
(readlines method referenced in below thread)
f = open(filename)
lines = 0
buf_size = 1024 * 1024
read_f = f.read # loop optimization
buf = read_f(buf_size)
while buf:
lines += buf.count('\n')
buf = read_f(buf_size)
return lines