views:

3451

answers:

3
+2  Q: 

python file size

I am trying to split up a large xml file into smaller chunks. I write to the output file and then check its size to see if its passed a threshold, but I dont think the getsize() method is working as expected.

What would be a good way to get the filesize of a file that is changing in size.

Ive done something like this...

import string
import os

f1 = open('VSERVICE.xml', 'r')
f2 = open('split.xml', 'w')

for line in f1:
  if str(line) == '</Service>\n':
    break
  else:
    f2.write(line)
    size = os.path.getsize('split.xml')
    print('size = ' + str(size))

running this prints 0 as the filesize for about 80 iterations and then 4176. Does Python store the output in a buffer before actually outputting it?

+3  A: 

Yes, Python is buffering your output. You'd be better off tracking the size yourself, something like this:

size = 0
for line in f1:
  if str(line) == '</Service>\n':
    break
  else:
    f2.write(line)
    size += len(line)
    print('size = ' + str(size))

(That might not be 100% accurate, eg. on Windows each line will gain a byte because of the \r\n line separator, but it should be good enough for simple chunking.)

RichieHindle
Thanks! That should work. I dont need it to be 100% accurate.
Maulin
A: 

Tracking the size yourself will be fine for your case. A different way would be to flush the file buffers just before you check the size:

f2.write(line)
f2.flush()  # <-- buffers are written to disk
size = os.path.getsize('split.xml')

Doing that too often will slow down file I/O, of course.

efotinis
+1  A: 

Have you tried to replace os.path.getsize with os.tell, like this:

f2.write(line)
size = f2.tell()
Piotr Czapla