views:

509

answers:

4

I have a Python program that dies with a MemoryError when I feed it a large file. Are there any tools that I could use to figure out what's using the memory?

This program ran fine on smaller input files. The program obviously needs some scalability improvements; I'm just trying to figure out where. "Benchmark before you optimize", as a wise person once said.

(Just to forestall the inevitable "add more RAM" answer: This is running on a 32-bit WinXP box with 4GB RAM, so Python has access to 2GB of usable memory. Adding more memory is not technically possible. Reinstalling my PC with 64-bit Windows is not practical.)

EDIT: Oops, this is a duplicate of http://stackoverflow.com/questions/110259/python-memory-profiler

+4  A: 

Heapy is a memory profiler for Python, which is the type of tool you need.

Wim
BTW duplicate of this one: http://stackoverflow.com/questions/110259/python-memory-profiler
Wim
A: 

The simplest and lightweight way would likely be to use the built in memory query capabilities of Python, such as sys.getsizeof - just run it on your objects for a reduced problem (i.e. a smaller file) and see what takes a lot of memory.

Eli Bendersky
Good and quick solution. Some limitations though, since you need to know which object it is (or have an educated guess). Also, doing this on a list with 100 objects of 100 MB each will return the size of 100 pointers (so only a few KB)...
Wim
A: 

In your case, the answer is probably very simple: Do not read the whole file at once but process the file chunk by chunk. That may be very easy or complicated depending on your usage scenario. Just for example, a MD5 checksum computation can be done much more efficiently for huge files without reading the whole file in. The latter change has dramatically reduced memory consumption in some SCons usage scenarios but was almost impossible to trace with a memory profiler.

If you still need a memory profiler: eliben already suggested sys.getsizeof. If that doesn't cut it, try Heapy or Pympler.

Pankrat
A: 

You asked for a tool recommendation:

Python Memory Validator allows you to monitor the memory usage, allocation locations, GC collections, object instances, memory snapshots, etc of your Python application. Windows only.

http://www.softwareverify.com/python/memory/index.html

Disclaimer: I was involved in the creation of this software.

Stephen Kellett