views:

804

answers:

4

Is there a way to serialize a lexical closure in Python using the standard library? pickle and marshal appear not to work with lexical closures. I don't really care about the details of binary vs. string serialization, etc., it just has to work. For example:

def foo(bar, baz) :
    def closure(waldo) :
        return baz * waldo
    return closure

I'd like to just be able to dump instances of closure to a file and read them back.

Edit: One relatively obvious way that this could be solved is with some reflection hacks to convert lexical closures into class objects and vice-versa. One could then convert to classes, serialize, unserialize, convert back to closures. Heck, given that Python is duck typed, if you overloaded the function call operator of the class to make it look like a function, you wouldn't even really need to convert it back to a closure and the code using it wouldn't know the difference. If any Python reflection API gurus are out there, please speak up.

+6  A: 

If you simply use a class with a __call__ method to begin with, it should all work smoothly with pickle.

class foo:
    def __init__(self, bar, baz):
        self.baz = baz
    def __call__(self,waldo):
        return self.baz * waldo

On the other hand, a hack which converted a closure into an instance of a new class created at runtime would not work, because of the way pickle deals with classes and instances. pickle doesn't store classes; only a module name and class name. When reading back an instance or class it tries to import the module and find the required class in it. If you used a class created on-the-fly, you're out of luck.

If you're willing to use custom code to serialize/unserialize, I believe you could begin with a real closure and finish up with a reasonable facsimile. If that fits your constraints I might be able to find a workable approach.

Greg Ball
A: 

Recipe 500261: Named Tuples contains a function that defines a class on-the-fly. And this class supports pickling.

Here's the essence:

result.__module__ = _sys._getframe(1).f_globals.get('__name__', '__main__')

Combined with @Greg Ball's suggestion to create a new class at runtime it might answer your question.

J.F. Sebastian
A: 

Yes! I got it (at least I think) -- that is, the more generic problem of pickling a function. Python is so wonderful :), I found out most of it though the dir() function and a couple of web searches. Also wonderful to have it [hopefully] solved, I needed it also.

I haven't done a lot of testing on how robust this co_code thing is (nested fcns, etc.), and it would be nice if someone could look up how to hook Python so functions can be pickled automatically (e.g. they might sometimes be closure args).

Cython module _pickle_fcn.pyx

# -*- coding: utf-8 -*-

cdef extern from "Python.h":
    object PyCell_New(object value)

def recreate_cell(value):
    return PyCell_New(value)

Python file

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# author gatoatigrado [ntung.com]
import cPickle, marshal, types
import pyximport; pyximport.install()
import _pickle_fcn

def foo(bar, baz) :
    def closure(waldo) :
        return baz * waldo
    return closure

# really this problem is more about pickling arbitrary functions
# thanks so much to the original question poster for mentioning marshal
# I probably wouldn't have found out how to serialize func_code without it.
fcn_instance = foo("unused?", -1)
code_str = marshal.dumps(fcn_instance.func_code)
name = fcn_instance.func_name
defaults = fcn_instance.func_defaults
closure_values = [v.cell_contents for v in fcn_instance.func_closure]
serialized = cPickle.dumps((code_str, name, defaults, closure_values),
    protocol=cPickle.HIGHEST_PROTOCOL)

code_str_, name_, defaults_, closure_values_ = cPickle.loads(serialized)
code_ = marshal.loads(code_str_)
closure_ = tuple([_pickle_fcn.recreate_cell(v) for v in closure_values_])
# reconstructing the globals is like pickling everything :)
# for most functions, it's likely not necessary
# it probably wouldn't be too much work to detect if fcn_instance global element is of type
# module, and handle that in some custom way
# (have the reconstruction reinstantiate the module)
reconstructed = types.FunctionType(code_, globals(),
    name_, defaults_, closure_)
print(reconstructed(3))

cheers,
Nicholas

EDIT - more robust global handling is necessary for real-world cases. fcn.func_code.co_names lists global names.

gatoatigrado
A: 
#!python

import marshal, pickle, new

def dump_func(f):
    if f.func_closure:
        closure = tuple(c.cell_contents for c in f.func_closure)
    else:
        closure = None
    return marshal.dumps(f.func_code), f.func_defaults, closure


def load_func(code, defaults, closure, globs):
    if closure is not None:
        closure = reconstruct_closure(closure)
    code = marshal.loads(code)
    return new.function(code, globs, code.co_name, defaults, closure)


def reconstruct_closure(values):
    ns = range(len(values))
    src = ["def f(arg):"]
    src += [" _%d = arg[%d]" % (n, n) for n in ns]
    src += [" return lambda:(%s)" % ','.join("_%d"%n for n in ns), '']
    src = '\n'.join(src)
    try:
        exec src
    except:
        raise SyntaxError(src)
    return f(values).func_closure




if __name__ == '__main__':

    def get_closure(x):
        def the_closure(a, b=1):
            return a * x + b, some_global
        return the_closure

    f = get_closure(10)
    code, defaults, closure = dump_func(f)
    dump = pickle.dumps((code, defaults, closure))
    code, defaults, closure = pickle.loads(dump)
    f = load_func(code, defaults, closure, globals())

    some_global = 'some global'

    print f(2)
avdd