views:

6723

answers:

19

I've developed an equation parser using a simple stack algorithm that will handle binary (+, -, |, &, *, /, etc) operators, unary (!) operators, and parenthesis.

Using this method, however, leaves me with everything having the same precedence - it's evaluated left to right regardless of operator, although precedence can be enforced using parenthesis.

So right now "1+11*5" returns 60, not 56 as one might expect.

While this is suitable for the current project, I want to have a general purpose routine I can use for later projects.

Edited for clarity:

What is a good algorithm for parsing equations with precedence?

I'm interested in something simple to implement and understand that I can code myself to avoid licensing issues with available code.

Grammar:

I don't understand the grammar question - I've written this by hand. It's simple enough that I don't see the need for YACC or Bison. I merely need to calculate strings with equations such as "2+3 * (42/13)".

Language:

I'm doing this in C, but I'm interested in an algorithm, not a language specific solution. C is low level enough that it'll be easy to convert to another language should the need arise.

Code Example

I posted the test code for the simple expression parser I was talking about above. The project requirements altered and so I never needed to optimize the code for performance or space as it wasn't incorporated into the project. It's in the original verbose form, and should be readily understandable. If I do anything further with it in terms of operator precedence, I'll probably choose the macro hack because it matches the rest of the program in simplicity. If I ever use this in a real project, though, I'll be going for a more compact/speedy parser.

Related question

Smart design of a math parser?

+3  A: 

It would help if you could describe the grammar you are currently using to parse. Sounds like the problem might lie there!

Edit:

The fact that you don't understand the grammar question and that 'you've written this by hand' very likely explains why you're having problems with expressions of the form '1+11*5' (i.e., with operator precedence). Googling for 'grammar for arithmetic expressions', for example, should yield some good pointers. Such a grammar need not be complicated:

<Exp> ::= <Exp> + <Term> |
          <Exp> - <Term> |
          <Term>

<Term> ::= <Term> * <Factor> |
           <Term> / <Factor> |
           <Factor>

<Factor> ::= x | y | ... |
             ( <Exp> ) |
             - <Factor> |
             <Number>

would do the trick for example, and can be trivially augmented to take care of some more complicated expressions (including functions for example, or powers,...).

I suggest you have a look at this thread, for example.

Almost all introductions to grammars/parsing treat arithmetic expressions as an example.

Note that using a grammar does not at all imply using a specific tool (a la Yacc, Bison,...). Indeed, you most certainly are already using the following grammar:

<Exp>  :: <Leaf> | <Exp> <Op> <Leaf>

<Op>   :: + | - | * | /

<Leaf> :: <Number> | (<Exp>)

(or something of the kind) without knowing it!

OysterD
+3  A: 

Is there a language you want to use? ANTLR will let you do this from a Java perspective. Adrian Kuhn has an excellent writeup on how to write an executable grammar in Ruby; in fact, his example is almost exactly your arithmetic expression example.

James A. Rosen
I must admit that my examples given in the blog post are getting left-recursion wrong, ie a - b - c evaluates to (a - (b -c)) instead of ((a -b) - c). Actually, that reminds me of adding a todo that I should fix the blog posts.
Adrian
+3  A: 

It depends on how "general" you want it to be.

If you want it to be really really general such as be able to parse mathematical functions as well like sin(4+5)*cos(7^3) you will probably need a parse tree.

In which, I do not think that a complete implementation is proper to be pasted here. I'd suggest that you check out one of the infamous "Dragon book".

But if you just want precedence support, then you could do that by first converting the expression to postfix form in which an algorithm that you can copy-and-paste should be available from google or I think you can code it up yourself with a binary tree.

When you have it in postfix form, then it's piece of cake from then on since you already understand how the stack helps.

chakrit
The dragon book might be a little excessive for an expression evaluator - a simple recursive descent parser is all that's needed, but it's a must read if you want to do anything more extensive in compilers.
Eclipse
+14  A: 
Jared Updike
To emphasize my point, note that the markup in my post is not getting parsed correctly (and this varies between the markup rendered statically and that rendered in the WMD preview). There have been several attempts to fix it but I think THE PARSER IS WRONG. Do everyone a favor and get parsing right!
Jared Updike
+3  A: 

I would suggest cheating and using the Shunting Yard Algorithm. It's an easy means of writing a simple calculator-type parser and takes precedence into account.

If you want to properly tokenise things and have variables, etc. involved then I would go ahead and write a recursive descent parser as suggested by others here, however if you simply require a calculator-style parser then this algorithm should be sufficient :-)

kronoz
+22  A: 

The shunting yard algorithm is the right tool for this. Wikipedia is really confusing about this, but basically the algorithm works like this:

Say, you want to evaluate 1 + 2 * 3 + 4. Intuitively, you "know" you have to do the 2 * 3 first, but how do you get this result? The key is to realize that when you're scanning the string from left to right, you will evaluate an operator when the operator that follows it has a lower (or equal to) precedence. In the context of the example, here's what you want to do:

  1. Look at: 1 + 2, don't do anything.
  2. Now look at 1 + 2 * 3, still don't do anything.
  3. Now look at 1 + 2 * 3 + 4, now you know that 2 * 3 has to to be evaluated because the next operator has lower precedence.

How do you implement this?

You want to have two stacks, one for numbers, and another for operators. You push numbers onto the stack all the time. You compare each new operator with the one at the top of the stack, if the one on top of the stack has higher priority, you pop it off the operator stack, pop the operands off the number stack, apply the operator and push the result onto the number stack. Now you repeat the comparison with the top of stack operator.

Coming back to the example, it works like this:

N = [ ] Ops = [ ]

  • Read 1. N = [1], Ops = [ ]
  • Read +. N = [1], Ops = [+]
  • Read 2. N = [1 2], Ops = [+]
  • Read *. N = [1 2], Ops = [+ *]
  • Read 3. N = [1 2 3], Ops = [+ *]
  • Read +. N = [1 2 3], Ops = [+ *]
    • Pop 3, 2 and execute 2*3, and push result onto N. N = [1 6], Ops = [+]
    • + is left associative, so you want to pop 1, 6 off as well and execute the +. N = [7], Ops = [].
    • Finally push the [+] onto the operator stack. N = [7], Ops = [+].
  • Read 4. N = [7 4]. Ops = [+].
  • You're run out off input, so you want to empty the stacks now. Upon which you will get the result 11.

There, that's not so difficult, is it? And it makes no invocations to any grammars or parser generators.

Pramod
You don't actually need two stacks, as long as you can see the second thing on the stack without popping the top. You can instead use a single stack that alternates numbers and operators. This in fact corresponds to exactly what an LR parser generator (such as bison) does.
Chris Dodd
+1 for an excellent explanation of shunting yard.
Software Monkey
Really nice explanation of the algorithm I just implemented right now. Also you are not converting it to postfix which is also nice. Adding support for parenthesis is very easy too.
Giorgi
A simplified version for the shunting-yard algorithm can be found here: http://andreinc.net/2010/10/05/converting-infix-to-rpn-shunting-yard-algorithm/ (with implementations in Java and python)
Andrei Ciobanu
+4  A: 

Long time ago, I made up my own parsing algorithm, that I couldn't find in any books on parsing (like the Dragon Book). Looking at the pointers to the Shunting Yard algorithm, I do see the resemblance.

About 2 years ago, I made a post about it, complete with Perl source code, on http://www.perlmonks.org/?node_id=554516. It's easy to port to other languages: the first implementation I did was in Z80 assembler.

It's ideal for direct calculation with numbers, but you can use it to produce a parse tree if you must.

Update Because more people can read (or run) Javascript, I've reimplemented my parser in Javascript, after the code has been reorganized. The whole parser is under 5k of Javascript code (about 100 lines for the parser, 15 lines for a wrapper function) including error reporting, and comments.

You can find a live demo at http://users.telenet.be/bartl/expressionParser/expressionParser.html.

// operator table
var ops = {
   '+'  : {op: '+', precedence: 10, assoc: 'L', exec: function(l,r) { return l+r; } },
   '-'  : {op: '-', precedence: 10, assoc: 'L', exec: function(l,r) { return l-r; } },
   '*'  : {op: '*', precedence: 20, assoc: 'L', exec: function(l,r) { return l*r; } },
   '/'  : {op: '/', precedence: 20, assoc: 'L', exec: function(l,r) { return l/r; } },
   '**' : {op: '**', precedence: 30, assoc: 'R', exec: function(l,r) { return Math.pow(l,r); } }
};

// constants or variables
var vars = { e: Math.exp(1), pi: Math.atan2(1,1)*4 };

// input for parsing
// var r = { string: '123.45+33*8', offset: 0 };
// r is passed by reference: any change in r.offset is returned to the caller
// functions return the parsed/calculated value
function parseVal(r) {
    var startOffset = r.offset;
    var value;
    var m;
    // floating point number
    // example of parsing ("lexing") without aid of regular expressions
    value = 0;
    while("0123456789".indexOf(r.string.substr(r.offset, 1)) >= 0 && r.offset < r.string.length) r.offset++;
    if(r.string.substr(r.offset, 1) == ".") {
        r.offset++;
        while("0123456789".indexOf(r.string.substr(r.offset, 1)) >= 0 && r.offset < r.string.length) r.offset++;
    }
    if(r.offset > startOffset) {  // did that work?
        // OK, so I'm lazy...
        return parseFloat(r.string.substr(startOffset, r.offset-startOffset));
    } else if(r.string.substr(r.offset, 1) == "+") {  // unary plus
        r.offset++;
        return parseVal(r);
    } else if(r.string.substr(r.offset, 1) == "-") {  // unary minus
        r.offset++;
        return negate(parseVal(r));
    } else if(r.string.substr(r.offset, 1) == "(") {  // expression in parens
        r.offset++;   // eat "("
        value = parseExpr(r);
        if(r.string.substr(r.offset, 1) == ")") {
            r.offset++;
            return value;
        }
        r.error = "Parsing error: ')' expected";
        throw 'parseError';
    } else if(m = /^[a-z_][a-z0-9_]*/i.exec(r.string.substr(r.offset))) {  // variable/constant name        
        // sorry for the regular expression, but I'm too lazy to manually build a varname lexer
        var name = m[0];  // matched string
        r.offset += name.length;
        if(name in vars) return vars[name];  // I know that thing!
        r.error = "Semantic error: unknown variable '" + name + "'";
        throw 'unknownVar';        
    } else {
        if(r.string.length == r.offset) {
            r.error = 'Parsing error at end of string: value expected';
            throw 'valueMissing';
        } else  {
            r.error = "Parsing error: unrecognized value";
            throw 'valueNotParsed';
        }
    }
}

function negate (value) {
    return -value;
}

function parseOp(r) {
    if(r.string.substr(r.offset,2) == '**') {
        r.offset += 2;
        return ops['**'];
    }
    if("+-*/".indexOf(r.string.substr(r.offset,1)) >= 0)
        return ops[r.string.substr(r.offset++, 1)];
    return null;
}

function parseExpr(r) {
    var stack = [{precedence: 0, assoc: 'L'}];
    var op;
    var value = parseVal(r);  // first value on the left
    for(;;){
        op = parseOp(r) || {precedence: 0, assoc: 'L'}; 
        while(op.precedence < stack[stack.length-1].precedence ||
              (op.precedence == stack[stack.length-1].precedence && op.assoc == 'L')) {  
            // precedence op is too low, calculate with what we've got on the left, first
            var tos = stack.pop();
            if(!tos.exec) return value;  // end  reached
            // do the calculation ("reduce"), producing a new value
            value = tos.exec(tos.value, value);
        }
        // store on stack and continue parsing ("shift")
        stack.push({op: op.op, precedence: op.precedence, assoc: op.assoc, exec: op.exec, value: value});
        value = parseVal(r);  // value on the right
    }
}

function parse (string) {   // wrapper
    var r = {string: string, offset: 0};
    try {
        var value = parseExpr(r);
        if(r.offset < r.string.length){
          r.error = 'Syntax error: junk found at offset ' + r.offset;
            throw 'trailingJunk';
        }
        return value;
    } catch(e) {
        alert(r.error + ' (' + e + '):\n' + r.string.substr(0, r.offset) + '<*>' + r.string.substr(r.offset));
        return;
    }    
}
bart
+1  A: 

I've implemented a recursive descent parser in Java in the MathEclipse Parser project. It could also be used in as a Google Web Toolkit module

axelclk
+4  A: 

There's a nice article here about combining a simple recursive-descent parser with operator-precedence parsing. If you've been recently writing parsers, it should be very interesting and instructive to read.

Eli Bendersky
+1  A: 

I'm currently working on a series of articles building a regular expression parser as a learning tool for design patterns and readable programing. You can take a look at readablecode. The article presents a clear use of shunting yards algorithm.

Goran
+1  A: 

Recursive decent parsing made easy in Pascal with precedence and variables etc is explained nicely here:

Lets build a compiler

+1  A: 

I wrote an expression parser in F# and blogged about it here. It uses the shunting yard algorithm, but instead of converting from infix to RPN, I added a second stack to accumulate the results of calculations. It correctly handles operator precedence, but doesn't support unary operators. I wrote this to learn F#, not to learn expression parsing, though.

Erik Forbes
+1  A: 

I have writen an expression parser in C# on my blog. It does infix to postfix without the stack in the shunting yard algorithm. It only uses an array.

Guge
+2  A: 

I found this on the PIClist about the Shunting Yard algorithm:

Harold writes:

I remember reading, a long time ago, of an algorithm that converted algebraic expressions to RPN for easy evaluation. Each infix value or operator or parenthesis was represented by a railroad car on a track. One type of car split off to another track and the other continued straight ahead. I don't recall the details (obviously!), but always thought it would be interesting to code. This is back when I was writing 6800 (not 68000) assembly code.

This is the "shunting yard algorythm" and it is what most machine parsers use. See the article on parsing in Wikipedia. An easy way to code the shunting yard algorythm is to use two stacks. One is the "push" stack and the other the "reduce" or "result" stack. Example:

pstack = () // empty rstack = () input: 1+2*3 precedence = 10 // lowest reduce = 0 // don't reduce

start: token '1': isnumber, put in pstack (push) token '+': isoperator set precedence=2 if precedence < previous_operator_precedence then reduce() // see below put '+' in pstack (push) token '2': isnumber, put in pstack (push) token '*': isoperator, set precedence=1, put in pstack (push) // check precedence as // above token '3': isnumber, put in pstack (push) end of input, need to reduce (goal is empty pstack) reduce() //done

to reduce, pop elements from the push stack and put them into the result stack, always swap the top 2 items on pstack if they are of the form 'operator' 'number':

pstack: '1' '+' '2' '' '3' rstack: () ... pstack: () rstack: '3' '2' '' '1' '+'

if the expression would have been:

1*2+3

then the reduce trigger would have been the reading of the token '+' which has lower precendece than the '*' already pushed, so it would have done:

pstack: '1' '' '2' rstack: () ... pstack: () rstack: '1' '2' ''

and then pushed '+' and then '3' and then finally reduced:

pstack: '+' '3' rstack: '1' '2' '' ... pstack: () rstack: '1' '2' '' '3' '+'

So the short version is: push numbers, when pushing operators check the precedence of the previous operator. If it was higher than the operator's that is to be pushed now, first reduce, then push the current operator. To handle parens simply save the precedence of the 'previous' operator, and put a mark on the pstack that tells the reduce algorythm to stop reducing when solving the inside of a paren pair. The closing paren triggers a reduction as does the end of input, and also removes the open paren mark from the pstack, and restores the 'previous operation' precedence so parsing can continue after the close paren where it left off. This can be done with recursion or without (hint: use a stack to store the previous precedence when encountering a '(' ...). The generalized version of this is to use a parser generator implemented shunting yard algorythm, f.ex. using yacc or bison or taccle (tcl analog of yacc).

Peter

Adam Davis
+2  A: 

I have posted source for an ultra compact (1 class, < 10 KiB) Java Math Evaluator on my web site. This is a recursive descent parser of the type that caused the cranial explosion for the poster of the accepted answer.

It supports full precedence, parenthesis, named variables and single-argument functions.

Software Monkey
+2  A: 

Another resource for precedence parsing is the Operator-precedence parser entry on Wikipedia. Covers Dijkstra's shunting yard algorithm, and a tree alternate algorithm, but more notably covers a really simple macro replacement algorithm that can be trivially implemented in front of any precedence ignorant parser:

#include <stdio.h>
int main(int argc, char *argv[]){
  printf("((((");
  for(int i=1;i!=argc;i++){
    if(argv[i] && !argv[i][1]){
      switch(argv[i]){
      case '^': printf(")^("); continue;
      case '*': printf("))*(("); continue;
      case '/': printf("))/(("); continue;
      case '+': printf(")))+((("); continue;
      case '-': printf(")))-((("); continue;
      }
    }
    printf("%s", argv[i]);
  }
  printf("))))\n");
  return 0;
}

Invoke it as:

$ cc -o parenthesise parenthesise.c
$ ./parenthesise a \* b + c ^ d / e
((((a))*((b)))+(((c)^(d))/((e))))

Which is awesome in its simplicity, and very understandable.

Adam Davis
That's quite a nice little pearl. But extending it (say, with function application, implicit multiplication, prefix and postfix operators, optional type annotations, anything) would break the whole thing. In other words, it's an elegant hack.
Jared Updike
+5  A: 

Have you thought about using Boost Spirit? It allows you to write EBNF-like grammars in C++ like this:

group       = '(' >> expression >> ')';
factor      = integer | group;
term        = factor >> *(('*' >> factor) | ('/' >> factor));
expression  = term >> *(('+' >> term) | ('-' >> term));
Zifre
+1 And the upshot is, everything is part of Boost. The grammar for the calculator is here: http://spirit.sourceforge.net/distrib/spirit_1_8_5/libs/spirit/example/fundamental/tree_calc_grammar.hpp. The implementation of the calculator is here: http://spirit.sourceforge.net/distrib/spirit_1_8_5/libs/spirit/example/fundamental/ast_calc.cpp. And the documentation is here: http://spirit.sourceforge.net/distrib/spirit_1_8_5/libs/spirit/doc/trees.html. I will never understand why people still implement there own mini-parsers.
stephan
Way cool. Thanks for sharing.
Jared Updike
A: 

A Python solution using pyparsing can be found here. Parsing infix notation with various operators with precedence is fairly common, and so pyparsing also includes the operatorPrecedence expression builder. With it you can easily define boolean expressions using "AND", "OR", "NOT", for example. Or you can expand your four-function arithmetic to use other operators, such as ! for factorial, or '%' for modulus, or add P and C operators to compute permutations and combinations. You could write an infix parser for matrix notation, that includes handling of '-1' or 'T' operators (for inversion and transpose). The operatorPrecedence example of a 4-function parser (with '!' thrown in for fun) is here and a more fully featured parser and evaluator is here.

Paul McGuire
+2  A: 

http://www.engr.mun.ca/~theo/Misc/exp_parsing.htm

Very good explanation of different approaches:

  • Recursive-descent recognition
  • The shunting yard algorithm
  • The classic solution
  • Precedence climbing

Written in simple language and pseudo-code.

I like 'precedence climbing' one.

Alsin
Great find, thanks!
Adam Davis