views:

197

answers:

2

I'm trying to make a function in Python that does the equivalent of compile(), but also lets me get the original string back. Let's call those two functions comp() and decomp(), for disambiguation purposes. That is,

a = comp("2 * (3 + x)", "", "eval")
eval(a, dict(x=3)) # => 12
decomp(a) # => "2 * (3 + x)"

The returned string does not have to be identical ("2*(3+x)" would be acceptable), but it needs to be basically the same ("2 * x + 6" would not be).

Here's what I've tried that doesn't work:

  • Setting an attribute on the code object returned by compile. You can't set custom attributes on code objects.
  • Subclassing code so I can add the attribute. code cannot be subclassed.
  • Setting up a WeakKeyDictionary mapping code objects to the original strings. code objects cannot be weakly referenced.

Here's what does work, with issues:

  • Passing in the original code string for the filename to compile(). However, I lose the ability to actually keep a filename there, which I'd like to also do.
  • Keeping a real dictionary mapping code objects to strings. This leaks memory, although since compiling is rare, it's acceptable for my current use case. I could probably run the keys through gc.get_referrers periodically and kill off dead ones, if I had to.
+4  A: 

My approach would be to wrap the code object in another object. Something like this:

class CodeObjectEnhanced(object):
    def __init__(self, *args):
        self.compiled = compile(*args)
        self.original = args[0]
def comp(*args):
    return CodeObjectEnhanced(*args)

Then whenever you need the code object itself, you use a.compiled, and whenever you need the original, you use a.original. There may be a way to get eval to treat the new class as though it were an ordinary code object, redirecting the function to call eval(self.compiled) instead.

One advantage of this is the original string is deleted at the same time as the code object. However you do this, I think storing the original string is probably the best approach, as you end up with the exact string you used, not just an approximation.

Silverfish
+5  A: 

This is kind of a weird problem, and my initial reaction is that you might be better off doing something else entirely to accomplish whatever it is you're trying to do. But it's still an interesting question, so here's my crack at it: I make the original code source an unused constant of the code object.

import types

def comp(source, *args, **kwargs):
    """Compile the source string; takes the same arguments as builtin compile().
    Modifies the resulting code object so that the original source can be
    recovered with decomp()."""
    c = compile(source, *args, **kwargs)
    return types.CodeType(c.co_argcount, c.co_nlocals, c.co_stacksize, 
        c.co_flags, c.co_code, c.co_consts + (source,), c.co_names, 
        c.co_varnames, c.co_filename, c.co_name, c.co_firstlineno, 
        c.co_lnotab, c.co_freevars, c.co_cellvars)

def decomp(code_object):
    return code_object.co_consts[-1]


>>> a = comp('2 * (3 + x)', '', 'eval')
>>> eval(a, dict(x=3))
12
>>> decomp(a)
'2 * (3 + x)'
Miles