views:

173

answers:

4

I need a way to "inject" names into a function from an outer code block, so they are accessible locally and they don't need to be specifically handled by the function's code (defined as function parameters, loaded from *args etc.)

The simplified scenario: providing a framework within which the users are able to define (with as little syntax as possible) custom functions to manipulate other objects of the framework (which are not necessarily global).

Ideally, the user defines

def user_func():
    Mouse.eat(Cheese)
    if Cat.find(Mouse):
        Cat.happy += 1

Here Cat, Mouse and Cheese are framework objects that, for good reasons, cannot be bounded to the global namespace.

I want to write a wrapper for this function to behave like this:

def framework_wrap(user_func):
    # this is a framework internal and has name bindings to Cat, Mouse and Cheese
    def f():
        inject(user_func, {'Cat': Cat, 'Mouse': Mouse, 'Cheese': Cheese})
        user_func()
    return f

Then this wrapper could be applied to all user-defined functions (as a decorator, by the user himself or automatically, although I plan to use a metaclass).

@framework_wrap
def user_func():

I am aware of the Python 3's nonlocal keyword, but I still consider ugly (from the framework's user perspective) to add an additional line:

nonlocal Cat, Mouse, Cheese

and to worry about adding every object he needs to this line.

Any suggestion is greatly appreciated.

+1  A: 

If your application is strictly Python 3, I don't see how using Python 3's nonlocal is any uglier than writing a decorator to manipulate function's local namespace. I say give the nonlocal solution a try or rethink this strategy.

jathanism
I presented the decorator approach mainly for its simplicity. I want the wrapper for the functions to be called via a metaclass, so it won't be necessary for users to manually apply the decorator. I'd also like to keep the project backwards compatible with Python 2.x (>=2.6)
amadaeus
+2  A: 

Sounds like you maybe want to be using exec code in dict, where code is the user's function and dict is a dictionary you provide which can

  • be pre-filled with references to objects that the user code should be able to use
  • store any functions or variables declared by the user's code for later use by your framework.

Docs for exec: http://docs.python.org/reference/simple_stmts.html#the-exec-statement

However, I'm pretty sure that this would only work if the user's code is being brought in as a string and you need to exec it. If the function is already compiled, it will already have its global bindings set. So doing something like exec "user_func(*args)" in framework_dict won't work, because user_func's globals are already set to the module in which it was defined.

Since func_globals is readonly, I think you'll have to do something like what martineau suggests in order to modify the function globals.

I think it likely (unless you're doing something unprecedentedly awesome, or I'm missing some critical subtlety) that you probably would be better off putting your framework objects into a module, and then have the user code import that module. Module variables can be reassigned to or mutated or accessed quite readily by code that's been defined outside of that module, once the module has been imported.

I think this would be better for code readibility also, because user_func will end up having explicit namespacing for Cat, Dog, etc. rather than readers unfamiliar with your framework having to wonder where they came from. E.G. animal_farm.Mouse.eat(animal_farm.Cheese), or maybe lines like

from animal_farm import Goat
cheese = make_cheese(Goat().milk())

If you are doing something unprecedently awesome, I think you'll need to use the C API to pass arguments to a code object. It looks like the function PyEval_EvalCodeEx is the one you want.

intuited
I like how clean your approach is. A few problems, though: in order to avoid additional code compilation, I'd like to exec user_func.func_code (the code object), but I can't find any way to pass additional arguments to user_func call (if needed by function's definition). Another potential problem is the handling of globals in certain scenarios, but that's not really an issue for now.
amadaeus
If you add `code` to `dict`, you can `exec "code(parameters)" in dict`.
Zooba
But of course, you're not avoiding additional compilation in that case, my bad. Though if you have performance(?) concerns about compiling a simple function call, a (mostly) interpreted language is not the best choice anyway.
Zooba
Hmmm.. you mean you want to pass some parameters to a function declared with a known name in the user code? You can call into the resulting dictionary, i.e. `dict_passed_to_exec['name_of_user_function'](*args, **kwargs).
intuited
Also: should it be more suitable, you can load user code into a module; see [here](http://stackoverflow.com/questions/3799545/dynamically-importing-python-module/3799609#3799609) for details.
intuited
@intuited I know how to _call_ a function with parameters if I have a reference to that function. What I don't know is how to call a function (or func_code object) **via exec** and passing additional parameters (excluding Zooba's string eval solution).
amadaeus
That's what I'm saying, though. You use `exec` to run code in which there's a function declaration, then you call the declared function from your main code after the `exec`. Or you pass the parameter in to the exec block like a kwarg by storing it in a dictionary key. Both of these approaches are a bit awkward in that they rely on a convention of giving the function or variable a particular name. If you want to avoid this, and you know that there's only going to be one function declared in the exec block, you can check the dict's values for a callable instead of looking for a particular key.
intuited
Maybe I'm missing something though.. if you can post an example of code that you'd like to parameterize this way, I might understand better. For your second code sample in the question, it's just a straightforward translation of your `inject` to `exec`, rearranging the syntax appropriately.
intuited
Oh wait, you want to pass parameters through to the wrapped function, which has already been declared. Sorry, I think I missed the point of the question the first time through.
intuited
@intuited If the user defines `userf(arg1, arg2)` then I want something like `exec userf.func_code with_args (arg1, arg2) in dic`. I know I can render a string `s="userf(arg1, arg2)"` and then `exec s in dic`, but this involves additional compilation and... gives me an unpleasant feeling :)
amadaeus
Hmm, this turned out to be pretty interesting. I'm pretty sure that you can't pass arguments to a code object in straight python code; you'd need to use the C API. I gather that the function you'd want to call is [`PyEval_EvalCodeEx`](http://docs.python.org/c-api/veryhigh.html#PyEval_EvalCodeEx). `exec` calls this function indirectly via the simplified C function `PyEval_EvalCode`, which doesn't take arguments to the code object as arguments. I did a bit of grepping around Python's C source code, and it doesn't look like there are any Python functions that pass args to code objects.
intuited
I think it would almost certainly be better just to organize your framework in a different way. For example, putting the various animals into a module, and then having users import that module in order to access those objects, would likely accomplish what you want, probably in a clearer manner. So you would end up with the user function doing `framework.Cat.meow("rrowr")` or whatever.
intuited
Also, I don't think the `exec` approach will get you what you want in this case, because in order for the user function's globals to be bound to your dictionary, you would have to `exec` the function *declaration*, not the function *call*. I think there might be some way to switch the module that it thinks it's in, which could have the effect of rebinding its globals, which might work out. But that attribute of the function object (`func_globals`) is read-only, so maybe not.
intuited
I think initially I had the impression that the user code was coming in as a string, in which case you could `exec` it in a dictionary to determine its bindings.
intuited
+1  A: 

Edited answer -- restores namespace dict after calling user_func()

Tested using Python 2.7 (and left sub-optimal for readability):

# ===== framework.py

# framework objects
class Cat: pass
class Mouse: pass
class Cheese: pass

framework_namespace = {'Cat':Cat, 'Mouse':Mouse, 'Cheese':Cheese }

# framework decorator
def wrap(f):
    def wrapped_f(*args, **kwargs):
        # determine which names in framework's namespace collide and don't
        preexistent = [name for name in framework_namespace if name in f.func_globals]
        nonexistent = [name for name in framework_namespace if name not in preexistent]

        # save any preexistent name's values
        f.globals_save = dict( (name, f.func_globals[name]) for name in preexistent )

        # temporarily inject framework's namespace
        f.func_globals.update(framework_namespace)

        retval = f(*args, **kwargs) # call function and save return value

        # clean up namespace
        for name in nonexistent:
             del f.func_globals[name] # remove those that didn't exist

        # restore the values of any names that collided
        f.func_globals.update(f.globals_save)

        return retval

    return wrapped_f

# ===== end framework.py

import framework

Cat = 42

@framework.wrap
def user_func():
    print 'in user_func():'
    print "  Cat:", Cat
    print "  Mouse:", Mouse
    print "  Cheese:", Cheese

user_func()
print
print 'after user_func():'
try:
    print "  Cat restored to", Cat
except NameError:
    print "no pre-existing Cat"

try:
    print "  Mouse restored to", Mouse
except NameError:
    print "<no pre-existing Mouse>"

try:
    print "  Cheese restored to", Cheese
except NameError:
    print "<no pre-existing Cheese>"

# output
#
# in user_func():
#   Cat: framework.Cat
#   Mouse: framework.Mouse
#   Cheese: framework.Cheese
#
# after user_func():
#   Cat restored to 42
#   Mouse restored to <no pre-existing Mouse>
#   Cheese restored to <no pre-existing Cheese>

I've left out a few bells and whistles, such as preserving the original function's name, docstring, and exact signature, since they aren't really relevant to the main question.

martineau
I stumbled upon this approach and quickly dismissed it because of http://docs.python.org/reference/datamodel.html#index-843 (func_globals is mentioned as being Read-Only). I know that means you can't reassign func_globals to another dict, but is it safe to modify it?
amadaeus
@amadaeus: Yeah, I saw the RO attribute indication, but took it to mean reassigning to another dict, it doesn't say to leave mutable values alone. @AaronMcSmooth: Thanks for fixing up the triple-quoted docstrings. I really hate StackOverflow's syntax highlighter which doesn't realize it's doing Python...
martineau
@martineau I've just realized that f.func_globals is actually a reference to globals() dictionary, so your code actually binds the names to the global namespace.
amadaeus
@amadaeus: One definite problem with this is that the injected names are really being put into the module's namespace and are still there *after* calling `user_func()`. The decorator could clean them up if it was careful though (but would be a bit more complex).
martineau
@amadaeus: See modified answer.
martineau
@martineau It's working indeed, but I have second thoughts about the overhead for every wrapped_f call and the whole globals mess (e.g. at concurrent calls, as Ivo van der Wijk pointed out, this approach really needs a complete rework with thread locals).
amadaeus
@amadaeus: Yes, the overhead of providing the syntactic sugar you want properly -- as in dealing with name clashes and mult-threading -- seems a little high. Besides that, the whole idea of "injecting" things into a function's namespaces seems a bit "un-Pythonic" to me.
martineau
+2  A: 

The more I mess around with the stack, the more I wish I hadn't. Don't hack globals to do what you want. Hack bytecode instead. There's two ways that I can think of to do this.

1) Add cells wrapping the references that you want into f.func_closure. You have to reassemble the bytecode of the function to use LOAD_DEREF instead of LOAD_GLOBAL and generate a cell for each value. You then pass a tuple of the cells and the new code object to types.FunctionType and get a function with the appropriate bindings. Different copies of the function can have different local bindings so it should be as thread safe as you want to make it.

2) Add arguments for your new locals at the end of the functions argument list. Replace appropriate occurrences of LOAD_GLOBAL with LOAD_FAST. Then construct a new function by using types.FunctionType and passing in the new code object and a tuple of the bindings that you want as the default option. This is limited in the sense that python limits function arguments to 255 and it can't be used on functions that use variable arguments. None the less it struck me as the more challenging of the two so that's the one that I implemented (plus there's other stuff that can be done with this one). Again, you can either make different copies of the function with different bindings or call the function with the bindings that you want from each call location. So it too can be as thread safe as you want to make it.

import types
import opcode

# Opcode constants used for comparison and replacecment
LOAD_FAST = opcode.opmap['LOAD_FAST']
LOAD_GLOBAL = opcode.opmap['LOAD_GLOBAL']
STORE_FAST = opcode.opmap['STORE_FAST']

DEBUGGING = True

def append_arguments(code_obj, new_locals):
    co_varnames = code_obj.co_varnames   # Old locals
    co_names = code_obj.co_names      # Old globals
    co_argcount = code_obj.co_argcount     # Argument count
    co_code = code_obj.co_code         # The actual bytecode as a string

    # Make one pass over the bytecode to identify names that should be
    # left in code_obj.co_names.
    not_removed = set(opcode.hasname) - set([LOAD_GLOBAL])
    saved_names = set()
    for inst in instructions(co_code):
        if inst[0] in not_removed:
            saved_names.add(co_names[inst[1]])

    # Build co_names for the new code object. This should consist of 
    # globals that were only accessed via LOAD_GLOBAL
    names = tuple(name for name in co_names
                  if name not in set(new_locals) - saved_names)

    # Build a dictionary that maps the indices of the entries in co_names
    # to their entry in the new co_names
    name_translations = dict((co_names.index(name), i)
                             for i, name in enumerate(names))

    # Build co_varnames for the new code object. This should consist of
    # the entirety of co_varnames with new_locals spliced in after the
    # arguments
    new_locals_len = len(new_locals)
    varnames = (co_varnames[:co_argcount] + new_locals +
                co_varnames[co_argcount:])

    # Build the dictionary that maps indices of entries in the old co_varnames
    # to their indices in the new co_varnames
    range1, range2 = xrange(co_argcount), xrange(co_argcount, len(co_varnames))
    varname_translations = dict((i, i) for i in range1)
    varname_translations.update((i, i + new_locals_len) for i in range2)

    # Build the dictionary that maps indices of deleted entries of co_names
    # to their indices in the new co_varnames
    names_to_varnames = dict((co_names.index(name), varnames.index(name))
                             for name in new_locals)

    if DEBUGGING:
        print "injecting: {0}".format(new_locals)
        print "names: {0} -> {1}".format(co_names, names)
        print "varnames: {0} -> {1}".format(co_varnames, varnames)
        print "names_to_varnames: {0}".format(names_to_varnames)
        print "varname_translations: {0}".format(varname_translations)
        print "name_translations: {0}".format(name_translations)


    # Now we modify the actual bytecode
    modified = []
    for inst in instructions(code_obj.co_code):
        # If the instruction is a LOAD_GLOBAL, we have to check to see if
        # it's one of the globals that we are replacing. Either way,
        # update its arg using the appropriate dict.
        if inst[0] == LOAD_GLOBAL:
            print "LOAD_GLOBAL: {0}".format(inst[1])
            if inst[1] in names_to_varnames:
                print "replacing with {0}: ".format(names_to_varnames[inst[1]])
                inst[0] = LOAD_FAST
                inst[1] = names_to_varnames[inst[1]]
            elif inst[1] in name_translations:    
                inst[1] = name_translations[inst[1]]
            else:
                raise ValueError("a name was lost in translation")
        # If it accesses co_varnames or co_names then update its argument.
        elif inst[0] in opcode.haslocal:
            inst[1] = varname_translations[inst[1]]
        elif inst[0] in opcode.hasname:
            inst[1] = name_translations[inst[1]]
        modified.extend(write_instruction(inst))

    code = ''.join(modified)
    # Done modifying codestring - make the code object

    return types.CodeType(co_argcount + new_locals_len,
                          code_obj.co_nlocals + new_locals_len,
                          code_obj.co_stacksize,
                          code_obj.co_flags,
                          code,
                          code_obj.co_consts,
                          names,
                          varnames,
                          code_obj.co_filename,
                          code_obj.co_name,
                          code_obj.co_firstlineno,
                          code_obj.co_lnotab)


def instructions(code):
    code = map(ord, code)
    i, L = 0, len(code)
    extended_arg = 0
    while i < L:
        op = code[i]
        i+= 1
        if op < opcode.HAVE_ARGUMENT:
            yield [op, None]
            continue
        oparg = code[i] + (code[i+1] << 8) + extended_arg
        extended_arg = 0
        i += 2
        if op == opcode.EXTENDED_ARG:
            extended_arg = oparg << 16
            continue
        yield [op, oparg]

def write_instruction(inst):
    op, oparg = inst
    if oparg is None:
        return [chr(op)]
    elif oparg <= 65536L:
        return [chr(op), chr(oparg & 255), chr((oparg >> 8) & 255)]
    elif oparg <= 4294967296L:
        return [chr(opcode.EXTENDED_ARG),
                chr((oparg >> 16) & 255),
                chr((oparg >> 24) & 255),
                chr(op),
                chr(oparg & 255),
                chr((oparg >> 8) & 255)]
    else:
        raise ValueError("Invalid oparg: {0} is too large".format(oparg))



if __name__=='__main__':
    import dis

    class Foo(object):
        y = 1

    z = 1
    def test(x):
        foo = Foo()
        foo.y = 1
        foo = x + y + z + foo.y
        print foo

    code_obj = append_arguments(test.func_code, ('y',))
    f = types.FunctionType(code_obj, test.func_globals, argdefs=(1,))
    if DEBUGGING:
        dis.dis(test)
        print '-'*20
        dis.dis(f)
    f(1)

Note that a whole branch of this code (that relating to EXTENDED_ARG) is untested but that for common cases, it seems to be pretty solid. I'll be hacking on it and am currently writing some code to validate the output. Then (when I get around to it) I'll run it against the whole standard library and fix any bugs.

I'll also probably be implementing the first option as well.

aaronasterling
This answer blew my mind.
Michael Foukarakis
Awesome work indeed! Personally, I find the first approach (involving the function's closure cells) cleaner (if you could label a bytecode hack as "clean"). I think I'll give it a try using Byteplay (http://wiki.python.org/moin/ByteplayDoc)
amadaeus
@amadaeus I agree with you about the first approach being cleaner. I'm writing testing code that should work for both approaches. I'll post it when I'm done. I had no idea about the existing bytecode modules though. I'll have to look at them. Thanks for posting.
aaronasterling