views:

785

answers:

5

Hi,

I have a python interpreter embedded inside an application. The application takes a long time to start up and I have no ability to restart the interpreter without restarting the whole application. What I would like to do is to essentially save the state of the interpreter and return to that state easily.

I started by storing the names of all modules in sys.modules that the python interpreter started with and then deleting all new modules from sys.modules when requested. This appears to make the interpreter prepared to re-import the same modules even though it has already imported them before. However, this doesn't seem to work in all situations, such as using singleton classes and static methods, etc.

I'd rather not embed another interpreter inside this interpreter if it can be avoided, as the ease of being able to use the applications API will be lost (as well as including a slight speed hit I imagine).

So, does anyone know of a way I could store the interpreter's state and then return to this so that it can cope with all situations?

Thanks,

Dan

A: 

One very hacky and bug prone approach might be a c module that simply copies the memory to a file so it can be loaded back the next time. But since I can't imagine that this would always work properly, would pickling be an alternative?

If you are able to make all of your modules pickleable than you should be able to pickle everything in globals() so it can be reloaded again.

A: 

If you know in advance the modules, classes, functions, variables, etc... in use, you could pickle them to disk and reload. I'm not sure off the top of my head the best way to tackle the issue if your environment contains many unknowns. Though, it may suffice to pickle globals and locals.

daniel
+2  A: 

Try this code from ActiveState recipes: http://code.activestate.com/recipes/572213/

It extends pickle so it supports pickling anything defined in the shell console. Theoretically you should just be able to pickle the main module, according to their documentation:

import savestate, pickle, __main__
pickle.dump(__main__, open('savestate.pickle', 'wb'), 2)
Daniel
This looks promising thanks, I'll look into this in a little more depth.
Dan
+1  A: 

I'd suggest tackling the root cause problem.

"The application takes a long time to start up and I have no ability to restart the interpreter without restarting the whole application"

I doubt this is actually 100% true. If the overall application is the result of an act of Congress, okay, it can't be changed. But if the overall application was written by real people, then finding and moving the code to restart the Python interpreter should be possible. It's cheaper, simpler and more reliable than anything else you might do to hack around the problem.

S.Lott
Not in this case - the application is a major piece of software released only once a year. Until this functionality gets introduced, a hack is the only solution available to speed up development.
Dan
Got it. Spending $100K of clever programmer time now to create a hack; another $100K to debate taking it back out and doing it better and another $100K to do it right. Instead of $10K to notify customers of an important change to their software.
S.Lott
I wish my time was worth that much! This is something I'm going to have to implement in a free hour or two to try and make my life a little bit easier. If it doesn't work well, we'll carry on using the existing system...
Dan
+1  A: 

storing the names of all modules in sys.modules that the python interpreter started with and then deleting all new modules from sys.modules when requested. This appears to make the interpreter prepared to re-import the same modules even though it has already imported them before.

The module-reload-forcing approach can be made to work in some circumstances but it's a bit hairy. In summary:

  • You need to make sure that all modules that have dependencies on each other are all reloaded at once. So any module 'x' that does 'import y' or 'from y import ...' must be deleted from sys.modules at the same time as module 'y'.

  • This process will need protecting with a lock if your app or any other active module is using threads.

  • Any module that leaves hooks pointing to itself in other modules cannot usefully be reloaded as references to the old module will remain in unreloaded/unreloadable code. This includes stuff like exception hooks, signals, warnings filters, encodings, monkey-patches and so on. If you start blithely reloading modules containing other people's code you might be surprised how often they do stuff like that, potentially resulting in subtle and curious errors.

So to get it to work you need to have well-defined boundaries between interdependent modules - "was it imported at initial start-up time" probably isn't quite good enough - and to make sure they're nicely encapsulated without unexpected dependencies like monkey-patching.

This can be based on folder, so for example anything in /home/me/myapp/lib could be reloaded as a unit, whilst leaving other modules alone - especially the contents of the stdlib in eg. /usr/lib/python2.x/ which is in general not reliable to reload. I've got code for this in an as-yet-unreleased webapp reloading wrapper, if you need.

Finally:

  • You need to know a little bit about the internals of sys.modules, specifically that it leaves a bunch of 'None' values to signify failed relative imports. If you don't delete them at the same time as you delete your other module values, the subsequent attempt to import a module can (sometimes) end up importing 'None', leading to confusing errors.

This is a nasty implementation detail which might change and break your app in some future Python version, but that is the price for playing with sys.modules in unsupported ways.

bobince
I think the problem with this approach is maintaining state. A module has its own namespace (as do the functions and classes inside it), so any function call could change that state. Even the order of imports could be significant if the module ran code in its global namespace (on import).
Daniel