1660

16
+14  Q:

## Expression Evaluation and Tree Walking using polymorphism? (ala Steve Yegge)

This morning, I was reading Steve Yegge's: When Polymorphism Fails, when I came across a question that a co-worker of his used to ask potential employees when they came for their interview at Amazon.

As an example of polymorphism in action, let's look at the classic "eval" interview question, which (as far as I know) was brought to Amazon by Ron Braunstein. The question is quite a rich one, as it manages to probe a wide variety of important skills: OOP design, recursion, binary trees, polymorphism and runtime typing, general coding skills, and (if you want to make it extra hard) parsing theory.

At some point, the candidate hopefully realizes that you can represent an arithmetic expression as a binary tree, assuming you're only using binary operators such as "+", "-", "*", "/". The leaf nodes are all numbers, and the internal nodes are all operators. Evaluating the expression means walking the tree. If the candidate doesn't realize this, you can gently lead them to it, or if necessary, just tell them.

Even if you tell them, it's still an interesting problem.

The first half of the question, which some people (whose names I will protect to my dying breath, but their initials are Willie Lewis) feel is a Job Requirement If You Want To Call Yourself A Developer And Work At Amazon, is actually kinda hard. The question is: how do you go from an arithmetic expression (e.g. in a string) such as "2 + (2)" to an expression tree. We may have an ADJ challenge on this question at some point.

The second half is: let's say this is a 2-person project, and your partner, who we'll call "Willie", is responsible for transforming the string expression into a tree. You get the easy part: you need to decide what classes Willie is to construct the tree with. You can do it in any language, but make sure you pick one, or Willie will hand you assembly language. If he's feeling ornery, it will be for a processor that is no longer manufactured in production.

You'd be amazed at how many candidates boff this one.

I won't give away the answer, but a Standard Bad Solution involves the use of a switch or case statment (or just good old-fashioned cascaded-ifs). A Slightly Better Solution involves using a table of function pointers, and the Probably Best Solution involves using polymorphism. I encourage you to work through it sometime. Fun stuff!

So, let's try to tackle the problem all three ways. How do you go from an arithmetic expression (e.g. in a string) such as "2 + (2)" to an expression tree using cascaded-if's, a table of function pointers, and/or polymorphism?

Feel free to tackle one, two, or all three.

[update: title modified to better match what most of the answers have been.]

+2  A:

String Tokenizer + LL(1) Parser will give you an expression tree... the polymorphism way might involve an abstract Arithmetic class with an "evaluate(a,b)" function, which is overridden for each of the operators involved (Addition, Subtraction etc) to return the appropriate value, and the tree contains Integers and Arithmetic operators, which can be evaluated by a post(?)-order traversal of the tree.

A:

should use a functional language imo. Trees are harder to represent and manipulate in OO languages.

+1  A:

Can someone just show me how your represent `2 + (2)` as a BST?

Or maybe this is the real question: how can you represent `(2)` as a BST? That is the part that is tripping me up.

Obviously, 2 + 2 is easy.

``````  +
/ \
2   2
``````

The problem, I think, is that we need to parse perentheses, and yet they are not a binary operator? Should we take `(2)` as a single token, that evaluates to `2`?

A:

Or maybe this is the real question: how can you represent (2) as a BST? That is the part that is tripping me up.

Recursion.

+2  A:

The problem, I think, is that we need to parse perentheses, and yet they are not a binary operator? Should we take (2) as a single token, that evaluates to 2?

The parens don't need to show up in the expression tree, but they do affect its shape. E.g., the tree for (1+2)+3 is different from 1+(2+3):

``````    +
/ \
+   3
/ \
1   2
``````

versus

``````    +
/ \
1   +
/ \
2   3
``````

The parentheses are a "hint" to the parser (e.g., per superjoe30, to "recursively descend")

+1  A:

Re: Justin

I think the tree would look something like this:

``````  +
/ \
2  ( )
|
2
``````

Basically, you'd have an "eval" node, that just evaluates the tree below it. That would then be optimized out to just being:

``````  +
/ \
2   2
``````

In this case the parens aren't required and don't add anything. They don't add anything logically, so they'd just go away.

+4  A:

This gets into parsing/compiler theory, which is kind of a rabbit hole... The Dragon Book is the standard text for compiler construction, and takes this to extremes. In this particular case, you want to construct a context-free grammar for basic arithmetic, then use that grammar to parse out an abstract syntax tree. You can then iterate over the tree, reducing it from the bottom up (it's at this point you'd apply the polymorphism/function pointers/switch statement to reduce the tree).

I've found these notes to be incredibly helpful in compiler and parsing theory.

Here's a minimal CFG for the original question:S -> NN -> 2N -> N ON -> ( N )O -> - N
+6  A:

Polymorphic Tree Walking, Python version

``````#!/usr/bin/python

class Node:
"""base class, you should not process one of these"""
def process(self):
raise('you should not be processing a node')

class BinaryNode(Node):
"""base class for binary nodes"""
def __init__(self, _left, _right):
self.left = _left
self.right = _right
def process(self):
raise('you should not be processing a binarynode')

class Plus(BinaryNode):
def process(self):
return self.left.process() + self.right.process()

class Minus(BinaryNode):
def process(self):
return self.left.process() - self.right.process()

class Mul(BinaryNode):
def process(self):
return self.left.process() * self.right.process()

class Div(BinaryNode):
def process(self):
return self.left.process() / self.right.process()

class Num(Node):
def __init__(self, _value):
self.value = _value
def process(self):
return self.value

def demo(n):
print n.process()

demo(Num(2))                                       # 2
demo(Plus(Num(2),Num(5)))                          # 2 + 3
demo(Plus(Mul(Num(2),Num(3)),Div(Num(10),Num(5)))) # (2 * 3) + (10 / 2)
``````

The tests are just building up the binary trees by using constructors.

program structure:

abstract base class: Node

• all Nodes inherit from this class

abstract base class: BinaryNode

• all binary operators inherit from this class
• process method does the work of evaluting the expression and returning the result

binary operator classes: Plus,Minus,Mul,Div

• two child nodes, one each for left side and right side subexpressions

number class: Num

• holds a leaf-node numeric value, e.g. 17 or 42
This answer is horribly over-engineered. The question was for 2 + (2), not an arbitrary arithmetic calculation. Also, this just executes the tree, it doesn't build it.
The question was for an arithmetic calculation SUCH AS 2 + (2), not the calculation of 2 + (2). Therefore, it's not over-engineered, but answers the question as intended.
This answer is not to the question of "How do you go from an arithmetic expression (e.g. in a STRING) such as "2 + (2)"....Where does "demo(Plus(Mul(Num(2),Num(3)),Div(Num(10),Num(5))))" come from? That other program we don't get to see?Why is this marked as best answer?
"You get the easy part: you need to decide what classes Willie is to construct the tree with."
A:

@Justin:

Look at my note on representing the nodes. If you use that scheme, then

``````2 + (2)
``````

can be represented as

``````           .
/ \
2  ( )
|
2
``````
+3  A:

Representing the Nodes

If we want to include parentheses, we need 5 kinds of nodes:

• the binary nodes: Add Minus Mul Div
these have two children, a left and right side

``````     +
/ \
node   node
``````
• a node to hold a value: Val
no children nodes, just a numeric value

• a node to keep track of the parens: Paren
a single child node for the subexpression

``````    ( )
|
node
``````

For a polymorphic solution, we need to have this kind of class relationship:

• Node
• BinaryNode : inherit from Node
• Plus : inherit from Binary Node
• Minus : inherit from Binary Node
• Mul : inherit from Binary Node
• Div : inherit from Binary Node
• Value : inherit from Node
• Paren : inherit from node

There is a virtual function for all nodes called eval(). If you call that function, it will return the value of that subexpression.

There is no reason to include parentheses in the abstract syntax tree.
+1  A:

I won't give away the answer, but a Standard Bad Solution involves the use of a switch or case statment (or just good old-fashioned cascaded-ifs). A Slightly Better Solution involves using a table of function pointers, and the Probably Best Solution involves using polymorphism.

The last twenty years of evolution in interpreters can be seen as going the other way - polymorphism (eg naive Smalltalk metacircular interpreters) to function pointers (naive lisp implementations, threaded code, C++) to switch (naive byte code interpreters), and then onwards to JITs and so on - which either require very big classes, or (in singly polymorphic languages) double-dispatch, which reduces the polymorphism to a type-case, and you're back at stage one. What definition of 'best' is in use here?

For simple stuff a polymorphic solution is OK - here's one I made earlier, but either stack and bytecode/switch or exploiting the runtime's compiler is usually better if you're, say, plotting a function with a few thousand data points.

+1  A:

I think the question is about how to write a parser, not the evaluator. Or rather, how to create the expression tree from a string.

Case statements that return a base class don't exactly count.

The basic structure of a "polymorphic" solution (which is another way of saying, I don't care what you build this with, I just want to extend it with rewriting the least amount of code possible) is deserializing an object hierarchy from a stream with a (dynamic) set of known types.

The crux of the implementation of the polymorphic solution is to have a way to create an expression object from a pattern matcher, likely recursive. I.e., map a BNF or similar syntax to an object factory.

MSN

A:

As people have been mentioning previously, when you use expression trees parens are not necessary. The order of operations becomes trivial and obvious when you're looking at an expression tree. The parens are hints to the parser.

While the accepted answer is the solution to one half of the problem, the other half - actually parsing the expression - is still unsolved. Typically, these sorts of problems can be solved using a recursive descent parser. Writing such a parser is often a fun exercise, but most modern tools for language parsing will abstract that away for you.

The parser is also significantly harder if you allow floating point numbers in your string. I had to create a DFA to accept floating point numbers in C -- it was a very painstaking and detailed task. Remember, valid floating points include: 10, 10., 10.123, 9.876e-5, 1.0f, .025, etc. I assume some dispensation from this (in favor of simplicty and brevity) was made in the interview.

+2  A:

Hm... I don't think you can write a top-down parser for this without backtracking, so it has to be some sort of a shift-reduce parser. LR(1) or even LALR will of course work just fine with the following (ad-hoc) language definition:

Start -> E1
E1 -> E1+E1 | E1-E1
E1 -> E2*E2 | E2/E2 | E2
E2 -> number | (E1)

Separating it out into E1 and E2 is necessary to maintain the precedence of * and / over + and -.

But this is how I would do it if I had to write the parser by hand:

• Two stacks, one storing nodes of the tree as operands and one storing operators
• Read the input left to right, make leaf nodes of the numbers and push them into the operand stack.
• If you have >= 2 operands on the stack, pop 2, combine them with the topmost operator in the operator stack and push this structure back to the operand tree, unless
• The next operator has higher precedence that the one currently on top of the stack.

This leaves us the problem of handling brackets. One elegant (I thought) solution is to store the precedence of each operator as a number in a variable. So initially,

• int plus, minus = 1;
• int mul, div = 2;

Now every time you see a a left bracket increment all these variables by 2, and every time you see a right bracket, decrement all the variables by 2.

This will ensure that the + in 3*(4+5) has higher precedence than the *, and 3*4 will not be pushed onto the stack. Instead it will wait for 5, push 4+5, then push 3*(4+5).

A:

I've written such a parser with some basic techniques like Infix -> RPN and Shunting Yard and Tree Traversals. Here is the implementation I've came up with.
It's written in C++ and compiles on both Linux and Windows.
Any suggestions/questions are welcomed.

So, let's try to tackle the problem all three ways. How do you go from an arithmetic expression (e.g. in a string) such as "2 + (2)" to an expression tree using cascaded-if's, a table of function pointers, and/or polymorphism?

This is interesting,but I don't think this belongs to the realm of object-oriented programming...I think it has more to do with parsing techniques.

A:

I've kind of chucked this c# console app together as a bit of a proof of concept. Have a feeling it could be a lot better (that switch statement in GetNode is kind of clunky (it's there coz I hit a blank trying to map a class name to an operator)). Any suggestions on how it could be improved very welcome.

``````using System;

class Program
{
static void Main(string[] args)
{
string expression = "(((3.5 * 4.5) / (1 + 2)) + 5)";
Console.WriteLine(string.Format("{0} = {1}", expression, new Expression.ExpressionTree(expression).Value));
Console.WriteLine("\nShow's over folks, press a key to exit");
}
}

namespace Expression
{
// -------------------------------------------------------

abstract class NodeBase
{
public abstract double Value { get; }
}

// -------------------------------------------------------

class ValueNode : NodeBase
{
public ValueNode(double value)
{
_double = value;
}

private double _double;
public override double Value
{
get
{
return _double;
}
}
}

// -------------------------------------------------------

abstract class ExpressionNodeBase : NodeBase
{
protected NodeBase GetNode(string expression)
{
// Remove parenthesis
expression = RemoveParenthesis(expression);

// Is expression just a number?
double value = 0;
if (double.TryParse(expression, out value))
{
return new ValueNode(value);
}
else
{
int pos = ParseExpression(expression);
if (pos > 0)
{
string leftExpression = expression.Substring(0, pos - 1).Trim();
string rightExpression = expression.Substring(pos).Trim();

switch (expression.Substring(pos - 1, 1))
{
case "+":
case "-":
return new Subtract(leftExpression, rightExpression);
case "*":
return new Multiply(leftExpression, rightExpression);
case "/":
return new Divide(leftExpression, rightExpression);
default:
throw new Exception("Unknown operator");
}
}
else
{
throw new Exception("Unable to parse expression");
}
}
}

private string RemoveParenthesis(string expression)
{
if (expression.Contains("("))
{
expression = expression.Trim();

int level = 0;
int pos = 0;

foreach (char token in expression.ToCharArray())
{
pos++;
switch (token)
{
case '(':
level++;
break;
case ')':
level--;
break;
}

if (level == 0)
{
break;
}
}

if (level == 0 && pos == expression.Length)
{
expression = expression.Substring(1, expression.Length - 2);
expression = RemoveParenthesis(expression);
}
}
return expression;
}

private int ParseExpression(string expression)
{
int winningLevel = 0;
byte winningTokenWeight = 0;
int winningPos = 0;

int level = 0;
int pos = 0;

foreach (char token in expression.ToCharArray())
{
pos++;

switch (token)
{
case '(':
level++;
break;
case ')':
level--;
break;
}

if (level <= winningLevel)
{
if (OperatorWeight(token) > winningTokenWeight)
{
winningLevel = level;
winningTokenWeight = OperatorWeight(token);
winningPos = pos;
}
}
}
return winningPos;
}

private byte OperatorWeight(char value)
{
switch (value)
{
case '+':
case '-':
return 3;
case '*':
return 2;
case '/':
return 1;
default:
return 0;
}
}
}

// -------------------------------------------------------

class ExpressionTree : ExpressionNodeBase
{
protected NodeBase _rootNode;

public ExpressionTree(string expression)
{
_rootNode = GetNode(expression);
}

public override double Value
{
get
{
return _rootNode.Value;
}
}
}

// -------------------------------------------------------

abstract class OperatorNodeBase : ExpressionNodeBase
{
protected NodeBase _leftNode;
protected NodeBase _rightNode;

public OperatorNodeBase(string leftExpression, string rightExpression)
{
_leftNode = GetNode(leftExpression);
_rightNode = GetNode(rightExpression);

}
}

// -------------------------------------------------------

{
: base(leftExpression, rightExpression)
{
}

public override double Value
{
get
{
return _leftNode.Value + _rightNode.Value;
}
}
}

// -------------------------------------------------------

class Subtract : OperatorNodeBase
{
public Subtract(string leftExpression, string rightExpression)
: base(leftExpression, rightExpression)
{
}

public override double Value
{
get
{
return _leftNode.Value - _rightNode.Value;
}
}
}

// -------------------------------------------------------

class Divide : OperatorNodeBase
{
public Divide(string leftExpression, string rightExpression)
: base(leftExpression, rightExpression)
{
}

public override double Value
{
get
{
return _leftNode.Value / _rightNode.Value;
}
}
}

// -------------------------------------------------------

class Multiply : OperatorNodeBase
{
public Multiply(string leftExpression, string rightExpression)
: base(leftExpression, rightExpression)
{
}

public override double Value
{
get
{
return _leftNode.Value * _rightNode.Value;
}
}
}
}
``````
Nice to see a full implementation, but I am bit confused why you combined the parsing logic with the expression tree representation, creating a tight coupling of the parsing logic with expression tree.Also, you could generate a map (driven by xml, a database, an in-code dictionary, or a custom attribute applied to each class representing a node) between your tokens and the class they map to. The problem here is the use of reflection (via Activator or some other technique) to generate your classes.
@Merritt mmm good point, might have another crack at this next time I have a free lunch time. Thanks for the suggestions.