views:

227

answers:

9

Hello everybody. I have to write, for academic purposes, an application that plots user-input expressions like: f(x) = 1 - exp(3^(5*ln(cosx)) + x)

The approach I've chosen to write the parser is to convert the expression in RPN with the Shunting-Yard algorithm, treating primitive functions like "cos" as unary operators. This means the function written above would be converted in a series of tokens like:

1, x, cos, ln, 5, *,3, ^, exp, -

The problem is that to plot the function I have to evaluate it LOTS of times, so applying the stack evaluation algorithm for each input value would be very inefficient. How can I solve this? Do I have to forget the RPN idea?

A: 

Why not keep around a parse tree (I use "tree" loosely, in your case it's a sequence of operations), and mark input variables accordingly? (e.g. for inputs x, y, z, etc. annotate "x" with 0 to signify the first input variable, "y" with 1 to signify the 2nd input variable, etc.)

That way you can parse the expression once, keep the parse tree, take in an array of inputs, and apply the parse tree to evaluate.

If you're worrying about the performance aspects of the evaluation step (vs. the parsing step), I don't think you'd do much better unless you get into vectorizing (applying your parse tree on a vector of inputs at once) or hard-coding the operations into a fixed function.

Jason S
I see what you're saying, but I've already solved the first problem by keeping the RPN tokens inside my "Function" class (which is basically a parse tree). The problem actually consists only on the performances of the evaluation routine.
Dave Danuve
A: 

Inefficient in what sense? There's machine time and programmer time. Is there a standard for how fast it needs to run with a particular level of complexity? Is it more important to finish the assignment and move on to the next one (perfectionists sometimes never finish)?

All those steps have to happen for each input value. Yes, you could have a heuristic that scans the list of operations and cleans it up a bit. Yes, you could compile some of it down to assembly instead of calling +, * etc. as high level functions. You can compare vectorization (doing all the +'s then all the *'s etc, with a vector of values) to doing the whole procedure for one value at a time. But do you need to?

I mean, what do you think happens if you plot a function in gnuplot or Mathematica?

Paul
I believe those applications use far more complex and elaborated parsing techniques. The stack evaluation algorithm is O(n) but since the expression has to be evaluated very often (every time the plot is resized for example) the algorithm has to be optimized.
Dave Danuve
Vectorization can be a real gem if you know all your x's when you want the y's. Or you can feed in an equally spaced sequence of x's take the y's and do interpolation.
Paul
+2  A: 

How much is "LOTS of times"? A million?

What kind of functions could be input? Can we assume they are continuous?

Did you try measuring how well your code performs?

(Sorry, started off with questions!)

You could try one of the two approaches (or both) described briefly below (there are probably many more):

1) Parse Trees.

You could create a Parse Tree. Then do what most compilers do to optimize expressions, constant folding, common subexpression elimination (which you could achieve by linking together the common expression subtrees and caching the result), etc.

Then you could use lazy evaluation techniques to avoid whole subtrees. For instance if you have a tree

    *
   / \
  A   B

where A evaluates to 0, you could completely avoid evaluating B as you know the result is 0. With RPN you would lose out on the lazy evaluation.

2) Interpolation

Assuming your function is continuous, you could approximate your function to a high degree of accuracy using Polynomial Interpolation. This way you can do the complicated calculation of the function a few times (based on the degree of polynomial you choose), and then do fast polynomial calculations for the rest of the time.

To create the initial set of data, you could just use approach 1 or just stick to using your RPN, as you would only be generating a few values.

So if you use Interpolation, you could keep your RPN...

Hope that helps!

Moron
"LOTS of times" means that if my plot X goes from -10 to 15 I would do h = 0.001; for (x = -10; x <= 15; x += 0.001) { y = expression.evaluate(x); // Plot results }which consists in 25,000 function calls, besides every time my plot is resized it has to be repainted.Anyways, I think a parse tree might be the way to go. Thank you for your help.
Dave Danuve
I would say, polynomial interpolation is more likely to give you significant results than when moving to parse trees and the effort might well be worth it (in fact, there will be free packages available to do this). btw, why does repainting cause you to evaluate again? Can't you just store them and use them again?
Moron
Why do you want 25,000 x's to make your plot when the screen is perhaps 1920x1080? Or is your media something else?
Paul
+2  A: 

Why reinvent the wheel? Use a fast scripting language instead. Integrating something like lua into your code will take very little time and be very fast.

You'll usually be able byte compile your expression, and that should result in code that runs very fast, certainly fast enough for simple 1D graphs.

I recommend lua as its fast, and integrates with C/C++ easier than any other scripting language. Another good options would be python, but while its better known I found it trickier to integrate.

Michael Anderson
A: 

What I do is use the shunting algorithm to produce the RPN. I then "compile" the RPN into a tokenised form that can be executed (interpretively) repeatedly without re-parsing the expression.

anon
That's exactly what I intend to do, I'm just worried about the performances of the evaluation step.
Dave Danuve
A: 

Your simple interpretation of RPN should work just fine, especially since it contains

  • math library functions like cos, exp, and ^(pow, involving logs)

  • symbol table lookup

Hopefully, your symbol table (with variables like x in it) will be short and simple.

The library functions will most likely be your biggest time-takers, so unless your interpreter is poorly written, it will not be a problem.

If, however, you really gotta go for speed, you could translate the expression into C code, compile and link it into a dll on-the-fly and load it (takes about a second). That, plus memoized versions of the math functions, could give you the best performance.

P.S. For parsing, your syntax is pretty vanilla, so a simple recursive-descent parser (about a page of code, O(n) same as shunting-yard) should work just fine. In fact, you might just be able to compute the result as you parse (if math functions are taking most of the time), and not bother with parse trees, RPN, any of that stuff.

Mike Dunlavey
+1  A: 

Michael Anderson suggested Lua. If you want to try Lua for just this task, see my ae library.

lhf
A: 

I think this RPN based library can serve the purpose: http://expressionoasis.vedantatree.com/

I used it with one of my calculator project and it works well. It is small and simple, but extensible.

Kaouni
A: 

One optimization would be to replace the stack with an array of values and implement the evaluator as a three address mechine where each operation loads from two (or one) location and saves to a third. This can make for very tight code:

struct Op {
  enum {
    add, sub, mul, div,
    cos, sin, tan,
   //....
  } op;
  int a, b, d;
}

void go(Op* ops, int n, float* v) {
  for(int i = 0; i < n; i++) {
    switch(ops[i].op) {
      case add: v[op[i].d] = v[op[i].a] + v[op[i].b]; break;
      case sub: v[op[i].d] = v[op[i].a] - v[op[i].b]; break;
      case mul: v[op[i].d] = v[op[i].a] * v[op[i].b]; break;
      case div: v[op[i].d] = v[op[i].a] / v[op[i].b]; break;
      //...
    }
  }
}

The conversion from RPN to 3-address should be easy as 3-address is a generalization.

BCS