tags:

views:

384

answers:

2

I need to read all modules (pre-compiled) from a zipfile (built by py2exe compressed) into memory and then load them all. I know this can be done by loading direct from the zipfile but I need to load them from memory. Any ideas? (I'm using python 2.5.2 on windows) TIA Steve

+11  A: 

It depends on what exactly you have as "the module (pre-compiled)". Let's assume it's exactly the contents of a .pyc file, e.g., ciao.pyc as built by:

$ cat>'ciao.py'
def ciao(): return 'Ciao!' 
$ python -c'import ciao; print ciao.ciao()'
Ciao!

IOW, having thus built ciao.pyc, say that you now do:

$ python
Python 2.5.1 (r251:54863, Feb  6 2009, 19:02:12) 
[GCC 4.0.1 (Apple Inc. build 5465)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> b = open('ciao.pyc', 'rb').read()
>>> len(b)
200

and your goal is to go from that byte string b to an importable module ciao. Here's how:

>>> import marshal
>>> c = marshal.loads(b[8:])
>>> c
<code object <module> at 0x65188, file "ciao.py", line 1>

this is how you get the code object from the .pyc binary contents. Edit: if you're curious, the first 8 bytes are a "magic number" and a timestamp -- not needed here (unless you want to sanity-check them and raise exceptions if warranted, but that seems outside the scope of the question; marshal.loads will raise anyway if it detects a corrupt string).

Then:

>>> import types
>>> m = types.ModuleType('ciao')
>>> import sys
>>> sys.modules['ciao'] = m
>>> exec c in m.__dict__

i.e: make a new module object, install it in sys.modules, populate it by executing the code object in its __dict__. Edit: the order in which you do the sys.modules insertion and exec matters if and only if you may have circular imports -- but, this is the order Python's own import normally uses, so it's better to mimic it (which has no specific downsides).

You can "make a new module object" in several ways (e.g., from functions in standard library modules such as new and imp), but "call the type to get an instance" is the normal Python way these days, and the normal place to obtain the type from (unless it has a built-in name or you otherwise have it already handy) is from the standard library module types, so that's what I recommend.

Now, finally:

>>> import ciao
>>> ciao.ciao()
'Ciao!'
>>>

...you can import the module and use its functions, classes, and so on. Other import (and from) statements will then find the module as sys.modules['ciao'], so you won't need to repeat this sequence of operations (indeed you don't need this last import statement here if all you want is to ensure the module is available for import from elsewhere -- I'm adding it only to show it works;-).

Edit: If you absolutely must import in this way packages and modules therefrom, rather than "plain modules" as I just showed, that's doable, too, but a bit more complicated. As this answer is already pretty long, and I hope you can simplify your life by sticking to plain modules for this purpose, I'm going to shirk that part of the answer;-).

Also note that this may or may not do what you want in cases of "loading the same module from memory multiple times" (this rebuilds the module each time; you might want to check sys.modules and just skip everything if the module's already there) and in particular when such repeated "load from memory" occurs from multiple threads (needing locks -- but, a better architecture is to have a single dedicated thread devoted to performing the task, with other modules communicating with it via a Queue).

Finally, there's no discussion of how to install this functionality as a transparent "import hook" which automagically gets involved in the mechanisms of the import statement internals themselves -- that's feasible, too, but not exactly what you're asking about, so here, too, I hope you can simplify your life by doing things the simple way instead, as this answer outlines.

Alex Martelli
Just curious, Will it be also works for .pyd files?
S.Mark
I was having the same requirement and was looking for a solution, thanks Alex :)
Technofreak
@S.Mark, no, `.pyc` or `.pyo` only -- `.pyd`s are essentially `.dll`s under a false name and present **completely** different problems. @Technofreak, you're welcome!-)
Alex Martelli
Almost no explanation, a lot of issues are not addressed. This doesn't look like an answer from Alex Martelli :(.
Denis Otkidach
@Denis, I've edited to add bits and pieces of trivia, but I'm very dubious as to whether this actually enhances the answer. Yes, there are many more issues (some of which you address, such as the possible need for locking when the same module's imported from multiple threads, and others you don't, such as other repeated imports, packages and dotted-names modules therefrom, etc) -- but, there's a limit to the useful length of a practical answer, and I chose to use space to show details at the interactive prompt rather than for marginalia.
Alex Martelli
Thanks for the example and quick reply. I have been using your idea and it works great for modules defined in single files but what should I do for files in packages (__init__ etc) in the zip? I guess this is where it gets a bit complex?
Steve
@Steve, yep, it's one of the things that can make this reasonably simple code more complicated:-(. What about posting a separate Q specifying what of the various possible complicatons I listed in this A you absolutely _must_ address in your app...?
Alex Martelli
@Alex: I've used to learn a lot from your answers. The first version of answer was good, but not as good as I expected it from you.
Denis Otkidach
+4  A: 

Compiled Python file consist of

  1. magic number (4 bytes) to determine type and version of Python,
  2. timestamp (4 bytes) to check whether we have newer source,
  3. marshaled code object.

To load module you have to create module object with imp.new_module(), execute unmashaled code in new module's namespace and put it in sys.modules. Below in sample implementation:

import sys, imp, marshal

def load_compiled_from_memory(name, filename, data, ispackage=False):
    if data[:4]!=imp.get_magic():
        raise ImportError('Bad magic number in %s' % filename)
    # Ignore timestamp in data[4:8]
    code = marshal.loads(data[8:])
    imp.acquire_lock() # Required in threaded applications
    try:
        mod = imp.new_module(name)
        sys.modules[name] = mod # To handle circular and submodule imports 
                                # it should come before exec.
        try:
            mod.__file__ = filename # Is not so important.
            # For package you have to set mod.__path__ here. 
            # Here I handle simple cases only.
            if ispackage:
                mod.__path__ = [name.replace('.', '/')]
            exec code in mod.__dict__
        except:
            del sys.modules[name]
            raise
    finally:
        imp.release_lock()
    return mod

Update: the code is updated to handle packages properly.

Note that you have to install import hook to handle imports inside loaded modules. One way to do this is adding your finder into sys.meta_path. See PEP302 for more information.

Denis Otkidach
Thanks for the example and quick reply.
Steve