I need to read all modules (pre-compiled) from a zipfile (built by py2exe compressed) into memory and then load them all. I know this can be done by loading direct from the zipfile but I need to load them from memory. Any ideas? (I'm using python 2.5.2 on windows) TIA Steve
It depends on what exactly you have as "the module (pre-compiled)". Let's assume it's exactly the contents of a .pyc
file, e.g., ciao.pyc
as built by:
$ cat>'ciao.py'
def ciao(): return 'Ciao!'
$ python -c'import ciao; print ciao.ciao()'
Ciao!
IOW, having thus built ciao.pyc
, say that you now do:
$ python
Python 2.5.1 (r251:54863, Feb 6 2009, 19:02:12)
[GCC 4.0.1 (Apple Inc. build 5465)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> b = open('ciao.pyc', 'rb').read()
>>> len(b)
200
and your goal is to go from that byte string b
to an importable module ciao
. Here's how:
>>> import marshal
>>> c = marshal.loads(b[8:])
>>> c
<code object <module> at 0x65188, file "ciao.py", line 1>
this is how you get the code object from the .pyc
binary contents. Edit: if you're curious, the first 8 bytes are a "magic number" and a timestamp -- not needed here (unless you want to sanity-check them and raise exceptions if warranted, but that seems outside the scope of the question; marshal.loads
will raise anyway if it detects a corrupt string).
Then:
>>> import types
>>> m = types.ModuleType('ciao')
>>> import sys
>>> sys.modules['ciao'] = m
>>> exec c in m.__dict__
i.e: make a new module object, install it in sys.modules
, populate it by executing the code object in its __dict__
. Edit: the order in which you do the sys.modules
insertion and exec
matters if and only if you may have circular imports -- but, this is the order Python's own import
normally uses, so it's better to mimic it (which has no specific downsides).
You can "make a new module object" in several ways (e.g., from functions in standard library modules such as new
and imp
), but "call the type to get an instance" is the normal Python way these days, and the normal place to obtain the type from (unless it has a built-in name or you otherwise have it already handy) is from the standard library module types
, so that's what I recommend.
Now, finally:
>>> import ciao
>>> ciao.ciao()
'Ciao!'
>>>
...you can import the module and use its functions, classes, and so on. Other import
(and from
) statements will then find the module as sys.modules['ciao']
, so you won't need to repeat this sequence of operations (indeed you don't need this last import
statement here if all you want is to ensure the module is available for import from elsewhere -- I'm adding it only to show it works;-).
Edit: If you absolutely must import in this way packages and modules therefrom, rather than "plain modules" as I just showed, that's doable, too, but a bit more complicated. As this answer is already pretty long, and I hope you can simplify your life by sticking to plain modules for this purpose, I'm going to shirk that part of the answer;-).
Also note that this may or may not do what you want in cases of "loading the same module from memory multiple times" (this rebuilds the module each time; you might want to check sys.modules and just skip everything if the module's already there) and in particular when such repeated "load from memory" occurs from multiple threads (needing locks -- but, a better architecture is to have a single dedicated thread devoted to performing the task, with other modules communicating with it via a Queue).
Finally, there's no discussion of how to install this functionality as a transparent "import hook" which automagically gets involved in the mechanisms of the import
statement internals themselves -- that's feasible, too, but not exactly what you're asking about, so here, too, I hope you can simplify your life by doing things the simple way instead, as this answer outlines.
Compiled Python file consist of
- magic number (4 bytes) to determine type and version of Python,
- timestamp (4 bytes) to check whether we have newer source,
- marshaled code object.
To load module you have to create module object with imp.new_module()
, execute unmashaled code in new module's namespace and put it in sys.modules
. Below in sample implementation:
import sys, imp, marshal
def load_compiled_from_memory(name, filename, data, ispackage=False):
if data[:4]!=imp.get_magic():
raise ImportError('Bad magic number in %s' % filename)
# Ignore timestamp in data[4:8]
code = marshal.loads(data[8:])
imp.acquire_lock() # Required in threaded applications
try:
mod = imp.new_module(name)
sys.modules[name] = mod # To handle circular and submodule imports
# it should come before exec.
try:
mod.__file__ = filename # Is not so important.
# For package you have to set mod.__path__ here.
# Here I handle simple cases only.
if ispackage:
mod.__path__ = [name.replace('.', '/')]
exec code in mod.__dict__
except:
del sys.modules[name]
raise
finally:
imp.release_lock()
return mod
Update: the code is updated to handle packages properly.
Note that you have to install import hook to handle imports inside loaded modules. One way to do this is adding your finder into sys.meta_path
. See PEP302 for more information.