views:

333

answers:

5

Suppose I need to create my own small DSL that would use Python to describe a certain data structure. E.g. I'd like to be able to write something like

f(x) = some_stuff(a,b,c)

and have Python, instead of complaining about undeclared identifiers or attempting to invoke the function some_stuff, convert it to a literal expression for my further convenience.

It is possible to get a reasonable approximation to this by creating a class with properly redefined __getattr__ and __setattr__ methods and use it as follows:

e = Expression()
e.f[e.x] = e.some_stuff(e.a, e.b, e.c)

It would be cool though, if it were possible to get rid of the annoying "e." prefixes and maybe even avoid the use of []. So I was wondering, is it possible to somehow temporarily "redefine" global name lookups and assignments? On a related note, maybe there are good packages for easily achieving such "quoting" functionality for Python expressions?

+1  A: 

You might want to take a look at the ast or parser modules included with Python to parse, access and transform the abstract syntax tree (or parse tree, respectively) of the input code. As far as I know, the Sage mathematical system, written in Python, has a similar sort of precompiler.

AKX
Thanks, that's one possibility indeed. Nonetheless it's not exactly what I'd like since it would require me to separate my "expressions to be parsed" from the usual python by passing them around as strings, which would not be as awesome if it'd be possible to keep using the standard parsing and simply redefine evaluation behaviour (`__getattr__` and the like).
KT
+3  A: 

I'm not sure it's a good idea, but I thought I'd give it a try. To summarize:

class PermissiveDict(dict):
    default = None

    def __getitem__(self, item):
        try:
            return dict.__getitem__(self, item)
        except KeyError:
            return self.default

def exec_with_default(code, default=None):
    ns = PermissiveDict()
    ns.default = default
    exec code in ns
    return ns
Ian Bicking
Awesome! "Exec ... in ..." is more or less what I was looking for. I can now put this stuff into an annotation and have some magic happening inside a given function. It's probably impossible to make it more syntax-sugary than that. Thanks!
KT
Heh, unfortunately this solution works differently depending on where you get the code object from. If it is a string or the result of a compile() things work as expected. However, if the code object is something like my_function.func_code, this fails due to the peculiarities of variable binding within a function - the local variables won't be looked up in the provided environment. It can be overcome by bytecode mangling (see code_all_variables_dynamic here: http://code.activestate.com/recipes/498242/), though. This makes the whole thing way more sophisticated than it could have been.
KT
Yes... these problems won't ever really stop, which is why it would be better to work inside the scope of what's available without hacks if at all possible.
Ian Bicking
A: 

I am also trying to create a DSL. I'm also contemplating what trick can I do to make the expression expressive and beautiful. To compose the expression as string and put it in exec ... in ... does not feel perfectly right for me.

Right now I have some idea. I'm only at a starting point so far. I could compose the expression using lambda like your second expression:

  f = lambda x: some_stuff(x.a, x.b, x.c)

Those x really feels superfluous here. So I would elide them like

  f = lambda: some_stuff(a, b, c)

This is as clean an expression as I can think of. It passes the Python compiler. But how can I execute them? I still haven't figure out. In my case I may need some ast magic because my function actually have different semantics than Python.

Let me know if you have found some useful technique. I may look the Sage mathematical system too to see if I can learn something.

Wai Yip Tung
Sage won't provide you the answers you are looking for because it literally re-parses the code. Note that in my specific case I still found out that having "markers" like the 'x' above is important for me, but otherwise I can offer you a fun solution. I'll post it a separate answer below.
KT
A: 

In response to Wai's comment, here's one fun solution that I've found. First of all, to explain once more what it does, suppose that you have the following code:

definitions = Structure()
definitions.add_definition('f[x]', 'x*2')
definitions.add_definition('f[z]', 'some_function(z)')
definitions.add_definition('g.i', 'some_object[i].method(param=value)')

where adding definitions implies parsing the left hand sides and the right hand sides and doing other ugly stuff. Now one (not necessarily good, but certainly fun) approach here would allow to write the above code as follows:

@my_dsl
def definitions():
    f[x] = x*2
    f[z] = some_function(z)
    g.i  = some_object[i].method(param=value)

and have Python do most of the parsing under the hood. The idea is based on the simple exec <code> in <environment> statement, mentioned by Ian, with one hackish addition. Namely, the bytecode of the function must be slightly tweaked and all local variable access operations (LOAD_FAST) switched to variable access from the environment (LOAD_NAME).

It is easier shown than explained: http://kt.pri.ee/stuff/pydsl/

There are various tricks you may want to do to make it practical. For example, in the code presented at the link above you can't use builtin functions and language constructions like for loops and if statements within a @my_dsl function. You can make those work, however, by adding more behaviour to the Env class.

Update. Here is a slightly more verbose explanation of the same thing.

KT
A: 

Regarding Ian's answer. It's interesting that this doesn't work:

def f():
  print hello

exec_with_default(f.func_code, 'world')

Any idea why?

Evgeny
As I see it, Python compiles functions in a way, which binds them to the existing globals and does not allow to "switch" those easily. The first obvious difference is seen when you compare dis.dis(f.func_code) to dis.dis(compile('print hello','','exec')). The former uses LOAD_GLOBAL, the latter uses LOAD_NAME. The second is in the flags of the code object (those are badly documented but my gut feeling is that one of those is responsible for how python interprets the LOAD_GLOBAL command). See the code in my answer above, where I mingle the bytecode to rename some commands and change the flags.
KT