ansaurus

Question

Save memory in Python. How to iterate over the lines and save them efficiently with a 2million line file?

Answer 1

+2 A:

This looks perfectly fine to me. Iterating over the file like that or using xreadlines() will read each line as needed (with sane buffering behind the scenes). Memory usage should not grow as you read in more and more data.

As for performance, you should profile your app. Most likely the bottleneck is somewhere in a deeper function, like POI.save().

Max Shawabkeh 2010-03-13 23:37:58

Answer 2

+2 A:

There's no reason to worry in the data you've given us: is memory consumption going UP as you read more and more lines? Now that would be cause for worry -- but there's no indication that this would happen in the code you've shown, assuming that p.save() saves the object to some database or file and not in memory, of course. There's nothing real to be gained by adding del statements, as the memory is getting recycled at each leg of the loop anyway.

This could be sped up if there's a faster way to populate a POI instance than binding its attributes one by one -- e.g., passing those attributes (maybe as keyword arguments? positional would be faster...) to the POI constructor. But whether that's the case depends on that geonames.models module, of which I know nothing, so I can only offer very generic advice -- e.g., if the module lets you save a bunch of POIs in a single gulp, then making them (say) 100 at a time and saving them in bunches should yield a speedup (at the cost of slightly higher memory consumption).

Alex Martelli 2010-03-13 23:38:56

Thanks for the comment. The increased memory consumption was caused by django's DEBUG db logging. I will keep your advice in mind for future performance increases. Simply setting DEBUG to False is keeping memory usage steady at 1% as we would expect.

skyl 2010-03-14 00:15:14

Answer 3

+4 A:

Make sure that Django's DEBUG setting is set to False

ericflo 2010-03-13 23:53:31

Yep, looks like this actually turned out to be a Django question rather than a Python question. The accumulating memory was due to Django's DEBUG logging.

skyl 2010-03-14 00:05:00

ansaurus

tags:

views:

answers:

Save memory in Python. How to iterate over the lines and save them efficiently with a 2million line file?

related questions