views:

263

answers:

1

Guys,

My parser evaluates PEMDAS expressions by first converting from infix to postfix then uses the standard postfix evaluation rules. I parse the expression and store the tokens in a list. This precompilation is ok for me since I plan on caching the precompiled functions.

I am trying to optimize the evaluate function (See Evaluate04 in code). On my hardware, I can get 1,000,000 evaluations in less than 600ms. Truthfully, this is fast enough I believe. Far faster than the databases access calls to get the expressions.

I wanted to see if you guys can make it faster. Micro-optimization, complete refactoring of classes. It doesn't matter how its done.

This is as fast as I've been able to get it, can you improve it?

using System;
using System.Collections.Generic;
using System.Linq;
using System.Windows.Forms;

namespace ParseTest2
{
    public partial class Form1 : Form
    {
        public Form1()
        {
            InitializeComponent();
        }

        enum Operator {Add, Subtract, Multiply, Divide}

        enum TokenType { Variable, Value, Add, Subtract, Multiply, Divide };

        private class TokenFactory
        {
            public static Token Add
            {
                get { return new Token(TokenType.Add); }
            }
            public static Token Subtract
            {
                get { return new Token(TokenType.Subtract); }
            }
            public static Token Multiply
            {
                get { return new Token(TokenType.Multiply); }
            }
            public static Token Divide
            {
                get { return new Token(TokenType.Divide); }
            }
            public static Token Val(float value)
            {
                return new Token(value);
            }
            public static Token Var(string variable)
            {
                return new Token(variable);
            }  
        }

        private class Token
        {
            public readonly TokenType TokenType;
            public float Value;
            public readonly string Variable;

            public Token(TokenType tokenType)
            {
                TokenType = tokenType;
                //Value = 0;
                //Variable = null;                
            }
            public Token(float value)
            {
                TokenType = TokenType.Value;
                Value = value;
                //Variable = null;
            }
            public Token(string variable)
            {
                //TokenType = TokenType.Variable;
                Variable = variable;
                //Value = 0;
            }
            public override string ToString()
            {
                return
                    Expression.IsOperand(this) ?
                    string.Format("{0} {1} {2}", TokenType, Variable ?? "", Value) :
                    string.Format("{0}", TokenType);
            }
        }

        class Expression
        {
            List<Token> _tokens;

            public Action<Token> SetVariableValue;

            public Expression(string expression)
            {
                //time to convert to postfix from infix
                var infix = expression.Split(' ');
                string postfix = string.Empty;

                Action<string> add = s => {postfix += " " + s;};
                var ops = new Stack<string>();

                Func<string, int> getPrecedence = value =>
                {
                    switch(value)
                    {
                        case "*":
                        case "/":
                            return 1;
                        case "+":
                        case "-":
                            return 0;

                    }
                    return -1;
                };

                Func<string, bool> isLowerOrEqualPrecedence = val => 
                {
                    if (ops.Count == 0) return false;
                    var op = ops.Peek();
                    int cur = getPrecedence(val);
                    int top = getPrecedence(op);
                    return cur <= top;
                };

                foreach (string val in infix)
                {
                    if ("-+*/".Contains(val))
                    {
                        if (ops.Count == 0)
                            ops.Push(val);
                        else
                        {
                            if (!isLowerOrEqualPrecedence(val))
                            {
                                ops.Push(val);
                            }
                            else
                            {
                                while (isLowerOrEqualPrecedence(val))
                                {
                                    add(ops.Pop());
                                }
                                ops.Push(val);
                            }
                        }
                    }
                    else if (val == "(")
                        ops.Push(val);
                    else if (val == ")")
                    {
                        while (true)
                        {
                            var op = ops.Pop();
                            if (op == "(") break;
                            add(op);
                        }
                    }
                    else
                    {
                        add(val);
                    }                    
                }

                while (ops.Count > 0)
                {
                    add(ops.Pop());
                }


                _tokens = new List<Token>();
                foreach (string x in postfix.Trim().Split(' '))
                { 
                    switch(x)
                    {
                        case "*":
                            _tokens.Add(TokenFactory.Multiply);
                            break;
                        case "/":
                            _tokens.Add(TokenFactory.Divide);
                            break;
                        case "+":
                            _tokens.Add(TokenFactory.Add);
                            break;                        
                        case "-":
                            _tokens.Add(TokenFactory.Subtract);
                            break;
                        default:
                            _tokens.Add(TokenFactory.Var(x));
                            break;
                    }
                }                             
            }

            public static bool IsOperand(Token tk)
            {
                return tk.TokenType == TokenType.Variable || tk.TokenType == TokenType.Value;
            }



            public float Evaluate04()
            {
                Token[] tokens = _tokens.ToArray();

                int i;

                for (i = tokens.Length - 1; i >= 0; --i)
                    if (tokens[i].TokenType == TokenType.Variable)
                        SetVariableValue(tokens[i]);

                int tokenCount = tokens.Length;

                while (true)
                {
                    int i1 = 0, i2 = 0, i3 = 0;
                    //i1 = i2 = i3 = -1;
                    int j;
                    Token t1, t2, t3;

                    //looking for operator preceded by two operands
                    for (i = 0; i < tokens.Length - 2; ++i)
                    //i = 0;
                    //while( true )
                    {
                        t1 = null;
                        t2 = null;
                        t3 = null;

                        j = i;
                        do
                        {
                            t1 = tokens[j];
                            i1 = j++;
                        } while (t1 == null);


                        do
                        {
                            t2 = tokens[j];
                            i2 = j++;
                        } while (t2 == null);

                        do
                        {
                            t3 = tokens[j];
                            i3 = j++;
                        } while (t3 == null);

                        //it's a binary operation if t3 is an operator and t2, t1 are operands
                        //if (!IsOperand(t3) && IsOperand(t2) && IsOperand(t1))
                        if (
                            !(t3.TokenType == TokenType.Variable || t3.TokenType == TokenType.Value) &&
                             (t2.TokenType == TokenType.Variable || t2.TokenType == TokenType.Value) &&
                             (t1.TokenType == TokenType.Variable || t1.TokenType == TokenType.Value))
                        {
                            float val;

                            if (t3.TokenType == TokenType.Add)
                                val = t1.Value + t2.Value;
                            else if (t3.TokenType == TokenType.Divide)
                                val = t1.Value / t2.Value;
                            else if (t3.TokenType == TokenType.Subtract)
                                val = t1.Value - t2.Value;
                            else if (t3.TokenType == TokenType.Multiply)
                                val = t1.Value * t2.Value;
                            else
                                val = int.MinValue;

                            //we're done, return
                            if (tokenCount == 3)
                                return val;

                            tokenCount -= 2;

                            //ok, we are done processing these, null them
                            tokens[i1] = new Token(val);
                            tokens[i2] = null;
                            tokens[i3] = null;

                            //break, for loop and repeat until done
                            break;
                        }
                    }
                }
            }
        }
        private void Form1_Load(object sender, EventArgs e)
        {

            Action<Token> setVariableValue = token =>
            {
                if (token.Variable == "A")
                    token.Value = 4;
                else if (token.Variable == "B")
                    token.Value = 5;
                else if (token.Variable == "C")
                    token.Value = 7;
                else if (token.Variable == "D")
                    token.Value = 2;
            };

            //spaces are required because I'm splitting on them.
            //I know it's lame, it's a detail I'll address later...
            var x = new Expression("( A + B ) / ( C + D )");
            x.SetVariableValue = setVariableValue;
            Func<float> eval = x.Evaluate04;
            eval();
            int count = 1000000;
            float res = 0;

            var sw = new System.Diagnostics.Stopwatch();

            //sw.Start();
            //Parallel.ForEach(Enumerable.Range(1, count), i =>
            //{
            //    res = eval();
            //});
            //sw.Stop();
            //Console.WriteLine("{0} takes: {1}", "parallel", sw.ElapsedMilliseconds);

            sw.Reset();
            sw.Start();

            //for(int i=0; i<count; ++i)
            foreach (int i in Enumerable.Range(1, count))
            {
                res = eval();
            }
            sw.Stop();
            Console.WriteLine("{0} takes: {1}", "single thread", sw.ElapsedMilliseconds);
            Console.WriteLine("{0}", res);

        }

        private void Form1_KeyDown(object sender, KeyEventArgs e)
        {
            if (e.KeyCode == Keys.Escape)
                Close();

        }         
    }
}
+1  A: 

Processing from parse-tree to RPN should not be a performance issue if you only do it once and do the evaluation many times.

What is all this token jazz, and indexing token arrays?

Why not just have a stack, like this:

for (i = 0; i < n; i++){
  switch(op[i]){
    case VARIABLE:
      stk[j++] = <value_of_variable>;
      break;
    case PLUS:
      temp = stk[--j];
      stk[j-1] += temp;
      break;
    // and so on...
  }
}

In the inner loop you shouldn't be doing any memory allocation. The most costly thing you should be doing is looking up the values of variables.

The easiest way to find out what is taking the most time is this.

The second-easiest way is just single-step through it at the disassembly level. If you find it going into any system routines like new, put a stop to that, fast. Same goes for iterators.

Mike Dunlavey
Thanks for the feedback, I tried the postfix algorithm and the stack did not perform as well as the array implementation I posted. I know there's a few more LOC than with the postfix algorithm. I think I'm at the point with it that I should look at the disassembly. I agree, newing objs is too expensive and iterators can slow it down as well.
Steve
@Steve: Good luck. My favorite method is to put it in a long-lasting loop and then manually pause it at random, maybe several times. That instantly spots any unwanted function calls. Then single-stepping when I'm down to hotspots.
Mike Dunlavey