views:

69

answers:

1

I need to be able to take a formula that uses the OpenDocument formula syntax, parse it into syntax that Python can understand, but without evaluating the variables, and then be able to evaluate the formula many times with changing valuables for the variables. Formulas can be user input, so pyparsing allows me to both effectively handle the formula syntax, and clean user input. There are a number of good examples of pyparsing available, but all the mathematical ones seem to assume that one evaluates everything in the current scope immediately.

For context, I am working with a model of the industrial economy (life cycle assessment, or LCA), where these formulas represent the amount of material or energy exchanges between processes. The variable amount can be a function of several parameters, such as geographical location. THe chain of formula and variable references are stored in a directed acyclic graph, so that formulas can always be simply evaluated. Formulas are stored as strings in a database. My questions are:

  1. Is it possible to parse a formula such that the parsed evaluation can also be stored in the database (as a string to be evaled, or something else)?
  2. Are there alternatives to this approach? Bear in mind that the ideal solution is to parse/write once, and read many times. For example, partially parsing the formula, and then using the ast module, although I don't know how this could work with database storage.
  3. Any examples of a project or library similar to this that I could look over? I am not a programmer, just a student trying to finish his thesis while making an open-source LCA software model in my spare time.
  4. Is this approach too slow? I would like to be able to do substantial Monte Carlo runs, where each run could involve tens of thousands of formula evaluations (it is a big database).
+3  A: 

1) Yes, it is possible to pickle the results from parsing your expression, and save that to a database. Then you can just fetch and unpickle the expression, rather than reparse the original again.

2) You can do a quick-and-dirty pass at this just using the compile and eval built-ins, as in the following interactive session:

>>> y = compile("m*x+b","","eval")
>>> m = 100
>>> x = 5
>>> b = 1
>>> eval(y)
501

Of course, this has the security pitfalls of any eval- or exec-based implementation, in that untrusted or malicious source strings can embed harmful system calls. But if this is your thesis and entirely within your scope of control, just don't do anything foolish.

3) You can get an online example of parsing an expression into a "evaluatable" data structure at the pyparsing wiki's Examples page. Check out simpleBool.py and evalArith.py especially. If you're feeling flush, order a back issue of the May,2008 issue of Python magazine, which has my article "Writing a Simple Interpreter/Compiler with Pyparsing" with a more detailed description of the methods used, plus a description of how pickling and unpickling the parsed results works.

4) The slow part will be the parsing, so you are on the right track in preserving these results in some intermediate and repeatably-evaluatable form. The eval part should be fairly snappy. The second slow part will be in fetching these pickled structures from your database. During your MC run, I would package a single function that takes the selection parameters for an expression, fetches from the database, and unpickles and returns the evaluatable expression. Then once you have this working, use a memoize decorator to cache these query-results pairs, so that any given expression only needs to be fetched/unpickled once.

Good luck with your thesis!

Paul McGuire