views:

80

answers:

4

Hi,

I have a long-running Python process that is generating more data than I planned for. My results are stored in a list that will be serialized (pickled) and written to disk when the program completes -- if it gets that far. But at this rate, it's more likely that the list will exhaust all 1+ GB free RAM and the process will crash, losing all my results in the process.

I plan to modify my script to write results to disk periodically, but I'd like to save the results of the currently-running process if possible. Is there some way I can grab an in-memory data structure from a running process and write it to disk?

I found code.interact(), but since I don't have this hook in my code already, it doesn't seem useful to me (http://stackoverflow.com/questions/1637198/method-to-peek-at-a-python-program-running-right-now).

I'm running Python 2.5 on Fedora 8. Any thoughts?

Thanks a lot.

Shahin

A: 

+1 Very interesting question.

I don't know how well this might work for you (especially since I don't know if you'll reuse the pickled list in the program), but I would suggest this: as you write to disk, print out the list to STDOUT. When you run your python script (I'm guessing also from command line), redirect the output to append to a file like so:

python myScript.py >> logFile. 

This should store all the lists in logFile. This way, you can always take a look at what's in logFile and you should have the most up to date data structures in there (depending on where you call print).

Hope this helps

inspectorG4dget
+2  A: 

There is not much you can do for a running program. The only thing I can think of is to attach the gdb debugger, stop the process and examine the memory. Alternatively make sure that your system is set up to save core dumps then kill the process with kill --sigsegv <pid>. You should then be able to open the core dump with gdb and examine it at your leisure.

There are some gdb macros that will let you examine python data structures and execute python code from within gdb, but for these to work you need to have compiled python with debug symbols enabled and I doubt that is your case. Creating a core dump first then recompiling python with symbols will NOT work, since all the addresses will have changed from the values in the dump.

Here are some links for introspecting python from gdb:

http://wiki.python.org/moin/DebuggingWithGdb

http://chrismiles.livejournal.com/20226.html

or google for 'python gdb'

N.B. to set linux to create coredumps use the ulimit command.

ulimit -a will show you what the current limits are set to.

ulimit -c unlimited will enable core dumps of any size.

Dave Kirby
Too bad. This sounds useful more generally, though, so I'll give it a shot. Thanks for the detailed response.
Shahin
+1  A: 

While certainly not very pretty you could try to access data of your process through the proc filesystem.. /proc/[pid-of-your-process]. The proc filesystem stores a lot of per process information such as currently open file pointers, memory maps and what not. With a bit of digging you might be able to access the data you need though.

Still i suspect you should rather look at this from within python and do some runtime logging&debugging.

gilligan
A: 

This answer has info on attaching gdb to a python process, with macros that will get you into a pdb session in that process. I haven't tried it myself but it got 20 votes. Sounds like you might end up hanging the app, but also seems to be worth the risk in your case.

intuited