tags:

views:

276

answers:

10

This isn't a school assignment or anything, but I realize it's a mostly academic question. But, what I've been struggling to do is parse 'math' text and come up with an answer.

For Example - I can figure out how to parse '5 + 5' or '3 * 5' - but I fail when I try to correctly chain operations together.

(5 + 5) * 3

It's mostly just bugging me that I can't figure it out. If anyone can point me in a direction, I'd really appreciate it.

EDIT Thanks for all of the quick responses. I'm sorry I didn't do a better job of explaining.

First - I'm not using regular expressions. I also know there are already libraries available that will take, as a string, a mathematical expression and return the correct value. So, I'm mostly looking at this because, sadly, I don't "get it".

Second - What I've tried doing (is probably misguided) but I was counting '(' and ')' and evaluating the deepest items first. In simple examples, this worked; but my code is not pretty and more complicated stuff crashes. When I 'calculated' the lowest level, I was modifying the string.

So... (5 + 5) * 3

Would turn into 10 * 3

Which would then evaluate to 30

But it just felt 'wrong'.

I hope that helps clarify things. I'll certainly check out the links provided.

+2  A: 

Did you ever take a class on formal languages in school? Effectively you need a grammar to parse by.

EDIT: Oh crap, Wikipedia says I'm wrong, but now I forget the correct name :( http://en.wikipedia.org/wiki/Formal_grammar

drachenstern
Infix notation needs a grammar. Postfix (RPN) can be parsed with a push-down automata which is much easier to implement than a grammar.
andand
@andand ~ Aha, I knew there was something about automata in there...
drachenstern
Be careful not no equate (formal) grammar with context-free grammar. A regular language is also generated by a grammar, a regular one.
anno
@anno ~ aren't the basic math ops pretty well able to be described by a formal grammar? But yes, it was context-free that I was thinking initially, along with automata.
drachenstern
+5  A: 

Ages ago when working on a simple graphing app, I used this algorithm (which is reasonably easy to understand and works great for simple math expressions like these) to first turn the expression into RPN and then calculated the result. RPN was nice and fast to execute for different variable values.

Of course, language parsing is a very wide topic and there are many other ways of going about it (and pre-made tools for it too)

Matti Virkkunen
This is a great idea for a small calculator parser and it can be implemented fast. but it can also get quite hairy fast if you want to do slightly more complex stuff like function calls (`sin`,`cos`)
shoosh
@shoosh: Actually, function calls could quite easily be implemented as unary operators (although the Wikipedia page seems to ignore them, the algorithm can be extended to take them into account). For multiple parameters, you could introduce a binary comma operator that packs values together.
Matti Virkkunen
drachenstern
@Matti, you're right ofcourse. The shit really hits the fan though when you want the same function name to be overloaded to take either one or two parameters.
shoosh
@shoosh: Nah, if you consider `f(x, y)` to be f applied to a single argument, that is `(x, y)`, then you'll be fine. You could see this argument as a tuple, created by the `,` operator (as Matti alluded to).
Joren
+2  A: 

Last year-ish I wrote a basic math evaluator for reasons I can't remember. It is not in any way a "proper" parser by any stretch of the term, and .. like all old code, I'm not that proud of it now.

But you can take a look and see if it helps you.

You run some input tests by launching this standalone Java app

Matt
+1  A: 

Essentially, you are asking us how to write a "parser." Here is another Stack Overflow question about parsers: hand coding a parser

Heath Hunnicutt
A: 

When I wanted to parse something I decided to use the GOLD Parser:

  • Self-contained documentation (don't need a book to understand it)
  • Various run-time engines, in various programming languages including the one I wanted.

The parser includes sample grammars, including e.g. one for operator prcedence.


Apart from GOLD are also other more famous parsers, e.g. ANTLR, which I haven't used.

ChrisW
A: 
Rice Flour Cookies
A: 

As many answers have already stated, the issue is that you need a recursive parser with associativity rules because you can end up with expressions like:

val = (2-(2+4+(3-2)))/(2+1)*(2-1)

and your parser needs to know that:

  1. The parenthetic expressions are evaluated from the inside out
  2. The division takes precedence over multiplication (you first divide, then multiply the result)
  3. The multiplication takes precedence over addition/subtraction

As you can imagine, writing a (good) parser is an art. The good thing is that there are several tools, called parser generators which allow you to easily define the grammar of your language, and the parsing rules. You may want to check the entries in Wikipedia for BNF, so that you can see how a grammar is defined.

Finally, if you are doing this for learning experience, go ahead. If this is for production code, do not reinvent the wheel, and find an existing library, otherwise you risk spending 1000 lines of code to add 2+2.

Arrieta
Andre Artus
+1  A: 

Here is a simple (naive operator precedence) grammar for what you want.

expression = 
    term
    | expression "+" term
    | expression "-" term .
term = 
    factor
    | term "*" factor
    | term "/" factor .
factor = 
    number
    | "(" expression ")" .

When you process "factor" you just check whether the next token is a number or "(", if it's a "(" then you parse "expression" again, when expression returns you check if the next token is ")". You could have the [calculated|read] values bubble up to the parent through the use of out or ref parameters, or build an expression tree.

Here is the same thing in EBNF:

expression = 
    term
    { "+" term | "-" term  } .

term = 
    factor
    { "*" factor | "/" factor }.

factor = 
    number
    | "(" expression ")" .
Andre Artus
+1  A: 

@Rising Star [I hoped to add this as a comment, but the formatting failed]

It may seem counterintuitive, but a binary tree is both simpler and more flexible. A node, in this case, would be either a constant (number) or an operator. A binary tree makes life somewhat easier when you decide to extend the language with elements like control flow, and functions.

Example:

((3 + 4 - 1) * 5 + 6 * -7) / 2

                  '/'
                /     \
              +        2
           /     \
         *         *
       /   \     /   \
      -     5   6     -7
    /   \
   +     1
 /   \
3     4

In the case above the scanner has been programmed to read '-' followed by a series of digits as a single number, so "-7" gets returned as the value component of the "number" token. '-' followed by whitespace is retured as a "minus" token. This makes the parser somewhat easier to write. It fails on the case where you want "-(x * y)", but you can easily change the expression to "0 - exp"

Andre Artus
A simple application of the composite pattern will see inner (composite) nodes as "BinaryOperator" and leaf nodes as "Constant". This is of course no longer strictly speaking a binary tree.
Andre Artus
I have to correct myself: "-" immediatly followed by "[0-9]" becomes a number, any other instance of "-" is returned as the MINUS token. A simple RegEx that returns a stream of "tokens" would be: "(-?[0-9]+|[*+-/()]|[a-z][a-z0-9]+|<=|>=|<|>|=)". this handles identifiers and relational operators. It treats unspecified lexemes as whitespace.
Andre Artus