views:

331

answers:

3

How do I let my users apply their own custom formula to a table of data to derive new fields?

I am working on a Django application which is going to store and process a lot of data for subscribed users on the open web. Think 100-10,000 sensor readings in one page request. I am going to be drawing graphs using this data and also showing tables containing it. I expect groups of sensors to be defined by my users, who will have registered themselves on my website (i.e they correspond with a django model).

I would like to allow the user to be able to create fields that are derived from their sensor data (as part of a setup process). For example, the user might know that their average house temperature is (temperature sensor1 + temperature sensor2) / 2 and want to show that on the graph. They might also want something more interesting like solar hot water heated is (temp out - temp in) * flow * conversion constant. I will then save these defined formulas for them and everyone else who views this page of sensor data.

The main question is how do I define the formula at the centre of the system. Do I just have a user-defined string to define the formula (say 100 chars long) and parse it myself - replace the user defined with an input sample and call it toast?

Update

In the end I got just the answer I asked for : A safe way to evaluate a stored user function on the server. Evaluating the same function also on the client when the function is being defined will be a great way to make the UI intuitive.

A: 

Another user asked a similar question in C. In that post, Warren suggested that the formula could be parsed and converted from

(a + c) / b

Into reverse polish notation

a c + b /

Which is easier to process.

In this case, you could intercept the formula model's save and generate the postfix notation from the user-defined formula. Once you have postfix notation, it is fairly straightforward to write a loop that evaluates the formula from left to right.

As for implementation in Django, the core question remaining is how to map different input fields into the formula. The simple solution would be a model representing the derived field uses a many-to-many relationship with the symbol name ("a", "b" or "c") defined per-input.

If performance is really critical, you might somehow further pre-process the postfix formula before applying it to the data.

Tom Leys
Your question is interesting, but this answer is hard to understand. It seems terse and doesn't explain anything.
Van Gale
Thankyou for the comment, I hope the new version is easier to understand.
Tom Leys
+2  A: 

I would work out what operations you want to support [+,-,*,/,(,),etc] and develop client side (javascript) to edit and apply those values to new fields of the data. I don't see the need to do any of this server-side and you will end up with a more responsive and enjoyable user experience as a result.

If you allow the user to save their formulas and re-load them when they revisit the site, you can get their browser to do all the calculations. Just provide some generic methods to add columns of data which are generated by applying one of their forumla's to your data.

I imagine the next step would be to allow them to apply those operations to the newly generated columns.

Have you considered posting their data into a google spreadsheet? This would save a lot of the development work as they already allow you to define formulas etc. and apply it to the data. I'm not too sure of the data limit (how much data you can post and operate on) mind you.

Jon Cage
I totally agree on the javascript to improve the experience. I hadn't considered the possibility of people adding formulas "on the fly" so to speak, beyond some sort of UI to define them initially. I plan to track users of the site (they are going to be registered customers) so saving the formulas should be pretty simple. I am already considering using a YUI DataTable (http://developer.yahoo.com/yui/datatable/).
Tom Leys
Well javascript should make interpreting the formulas fairly easy as long as you can restrict what your users can operate on.
Jon Cage
The other thing to consider is that you're going to end up with a lot of data if you store their results each time. The YUI data table looks like a good place to start. I'd use flotr for your graphing (http://solutoire.com/flotr/).
Jon Cage
I don't plan to store the results, just the formulas. I will re-calculate the data on the fly, possibly many times over several days. With caching on the server it should be plenty fast.
Tom Leys
Javascript is a good choice for this. It will also increase the scalability of your server code since the processing will be the burden of the browser. Also, if the user does manage to create something like an endless loop, it will only effect their browser- not the server.I am doing something similar on http://ETFtable.com I download a lot of historical data to the user's browser and run it against technical analysis functions in javascript, then output the data to a YUI Datatable.
NathanD
It is quite important to my application that sensors (and custom fields) can be monitored using pretty low-end cellphones with a web browser. While I imagine that iPhones can handle decent levels of Javascript I don't know if the same is true of your typical lynx browser or symbian mobile
Tom Leys
I think that customising formulas in a javascript interface "live" and then saving them for server-side processing if you control the page of sensor results should be a good balance.
Tom Leys
You may be able to evaluate the user-defined javascript formulas on the client side and/or server side. On the server side you may be able to interface with google's V8 or another javascript VM through Python. And on the browser with just the default JS engine for the browser. On the server side you would need to be careful to inspect the javascript formula for anything hazardous to your server environment.
NathanD
+2  A: 

Depends on who your clients are.

If this is "open to the public" on the WWW, you have to parse expressions yourself. You can use the Python compiler to compile Python syntax. You can also invent your own compiler for a subset of Python syntax. There are lots of examples; start with the ply project.

If this is in-house ("behind the firewall") let the post a piece of Python code and exec that code.

Give them an environment from math import * functionality available.

Fold the following around their supplied line of code:

def userFunc( col1, col2, col3, ... ):
   result1= {{ their code goes here }}
   return result1

Then you can exec the function definition and use the defined function without bad things happening.

While some folks like to crow that exec is "security problem", it's no more a security problem than user's sharing passwords, and admin's doing intentionally stupid things like deleting important files or turning the power off randomly while your programming is running.

exec is only a security problem if you allow anyone access to it. For in-house applications, you know the users. Train them.

S.Lott
Thanks for the pointer to the ply project. This is an open to the WWW approach and I fully expect members of the public to just wander in. This does rule out exec because crackers are much more creative than me.
Tom Leys
The very first PLY example (see http://www.dabeaz.com/ply/example.html) is exactly what I need. Thankyou very much
Tom Leys