views:

1072

answers:

5

My production system occasionally exhibits a memory leak I have not been able to reproduce in a development environment. I've used a Python memory profiler (specifically, Heapy) with some success in the development environment, but it can't help me with things I can't reproduce, and I'm reluctant to instrument our production system with Heapy because it takes a while to do its thing and its threaded remote interface does not work well in our server.

What I think I want is a way to dump a snapshot of the production Python process (or at least gc.get_objects), and then analyze it offline to see where it is using memory. How do I get a core dump of a python process like this? Once I have one, how do I do something useful with it?

+1  A: 

I don't know how to dump an entire python interpreter state and restore it. It would be useful, I'll keep my eye on this answer in case anyone else has ideas.

If you have an idea where the memory is leaking, you can add checks the refcounts of your objects. For example:

x = SomeObject()
... later ...
oldRefCount = sys.getrefcount( x )
suspiciousFunction( x )
if (oldRefCount != sys.getrefcount(x)):
    print "Possible memory leak..."

You could also check for reference counts higher than some number that is reasonable for your app. To take it further, you could modify the python interpreter to do these kinds of check by replacing the Py_INCREF and Py_DECREF macros with your own. This might be a bit dangerous in a production app, though.

Here is an essay with more info on debugging these sorts of things. It's more geared for plugin authors but most of it applies.

Debugging Reference Counts

Unfortunately that "idea of where memory is leaking" is what I am trying to get. That look-for-high-reference-counts approach might be useful though.
keturn
+2  A: 

The gc module has some functions that might be useful, like listing all objects the garbage collector found to be unreachable but cannot free, or a list of all objects being tracked.

If you have a suspicion which objects might leak, the weakref module could be handy to find out if/when objects are collected.

Torsten Marek
Yeah. If I could just figure out how to grab something like gc.get_objects and export it to be analyzed later. I don't think I can use pickle for that, as not everything is pickleable.
keturn
+2  A: 

Could you record the traffic (via a log) on your production site, then re-play it on your development server instrumented with a python memory debugger? (I recommend dozer: http://pypi.python.org/pypi/Dozer)

Brett
Might be worth a shot. There are various considerations, like how much disk write i/o I would use up and lining up the right database snapshot to work with the recorded inputs, but if we could get it working, it would be an incredibly useful tool.
keturn
+2  A: 

Make your program dump core, then clone an instance of the program on a sufficiently similar box using gdb. There are special macros to help with debugging python programs within gdb, but if you can get your program to concurrently serve up a remote shell, you could just continue the program's execution, and query it with python.

I have never had to do this, so I'm not 100% sure it'll work, but perhaps the pointers will be helpful.

fivebells
Installing a Manhole would be incredibly helpful, yes. Sadly, this is not a Twisted application.
keturn
A: 

Meliae looks promising:

This project is similar to heapy (in the 'guppy' project), in its attempt to understand how memory has been allocated.

Currently, its main difference is that it splits the task of computing summary statistics, etc of memory consumption from the actual scanning of memory consumption. It does this, because I often want to figure out what is going on in my process, while my process is consuming huge amounts of memory (1GB, etc). It also allows dramatically simplifying the scanner, as I don't allocate python objects while trying to analyze python object memory consumption.

keturn