views:

88

answers:

2

I have a binary tree based mathematical expression parser I built, which works great for 'normal' math, like: (3.5 * 2) ^ 1 / (1 << 6). however, I would like to expand it a little to add a ternary selection operator, mirroring the one from C: {expr} ? {true-expr} : {false-expr}. I would also like to add functions, like sin(x) or ave(...).

I however have no clue to how the handle this (due to the way the evaluation works), nor can I find anything on the web that covers this, atleast in a non-grammer based way (I'd like to avoid grammer parser generators for this, if possible).

My parser current works by evaluating an infix expression and immediatly converting it to a tree, then from there the tree can be evaluated, ie: its you bog standard expression tree.

currently my evaluator looks like so:

struct Node
{
    int nType;
    union
    {
        unsigned long dwOperator;
        BOOL bValue;
        int nValue; //for indices, args & functions
        number_t fValue;
        char* szValue; //for string literals to pass to functions
    };

    Node* pLeft;
    Node* pRight;
};

number_t EvaluateTree(Node* pNode)
{
    if(pNode == NULL)
        return 0.0f;

    int nType = pNode->nType;
    if(nType == TOKEN_OPERATOR)
    {
        number_t fLeft = EvaluateTree(pNode->pLeft);
        number_t fRight = EvaluateTree(pNode->pRight);
        switch(pNode->dwOperator)
        {
            case '+': return fLeft + fRight;
            case '-': return fLeft - fRight;
            case '*': return fLeft * fRight;
            case '/': return fLeft / fRight;
            case '^': return pow(fLeft,fRight);
            case '_': return pow(fLeft,1.0f/fRight); 
            case '%': return fmod(fLeft,fRight);

            //case '?': return bSelect = ?;
            //case ':': return (bSelect) ? fLeft : fRight;

            //case '>': return fLeft > fRight;
            //case '<': return fLeft < fRight;
            //case '>=': return fLeft >= fRight;
            //case '<=': return fLeft <= fRight;
            //case '==': return fLeft == fRight;
            //case '!=': return fLeft != fRight;
            //case '||': return fLeft || fRight;
            //case '&&': return fLeft && fRight;

            case '&': return static_cast<number_t>(static_cast<unsigned long>(fLeft) & static_cast<unsigned long>(fRight));
            case '|': return static_cast<number_t>(static_cast<unsigned long>(fLeft) | static_cast<unsigned long>(fRight));
            case '~': return static_cast<number_t>(~static_cast<unsigned long>(fRight));
            case '>>': return static_cast<number_t>(static_cast<unsigned long>(fLeft) >> static_cast<unsigned long>(fRight));
            case '<<': return static_cast<number_t>(static_cast<unsigned long>(fLeft) << static_cast<unsigned long>(fRight));

            default:  
                {
                    printf("ERROR: Invalid Operator Found\n");
                    return 0.0f;
                }
        }
    }
    else if(nType == TOKEN_NUMBER)
        return pNode->fValue;
    else if(nType == TOKEN_CALL)
        return CreateCall(pNode); //not implemented
    else if(nType == TOKEN_GLOBAL)
        return GetGlobal(pNode);
    else if(nType == TOKEN_ARGUMENT)
        return GetArgument(pNode);
    else if(nType == TOKEN_STRING)
        return 0.0f;

    return 0.0f;
}

Any tips/pointers/advice or useful links on how I can accomplish this?


A small set of examples (as requested):

What I already have working

Input: 2 * (3 ^ 1.5) - 4 / (1 << 3)

Output: In-Order: 2.0 * 3.0 ^ 1.5 - 4.0 / 1.0 << 3.0

Pre-Order: - * 2.0 ^ 3.0 1.5 / 4.0 << 1.0 3.0

Post-Order: 2.0 3.0 1.5 ^ * 4.0 1.0 3.0 << / -

Result: 9.892304

What I want to add

Input: (GetDay() == 31) ? -15.5 : 8.4

Output: 8.4

Output on the 31st: -15.5

Input: max([0],20) (where [0] denotes argument 0, and [0] = 35)

Output: 20

Input: (GetField('employees','years_of_service',[0]) >= 10) ? 0.15 : 0.07 (where [0] is argument 0, and [0] is set to a valid index)

Output (if years_of_service for the emplyee is less than 10: 0.15

else Output: 0.07

Its basically math with some C inspired additions, except arguments aren't passed by name, but rather index, and strings are escaped by single quotes instead doubles.

When once I have the final bit done, I'm hoping to either bytecode compile or JIT it, as I'm planing to use this for things like games or math reliant programs, where the input set data is constant, but the input set can change, but its being used frequently, so it needs to be 'fast', and it needs to be usable by non-programmers.

+1  A: 

The correct thing to do for ? and : depends on the tree produced by the parser. I will pretend the parser generates a tree like

      ?
  b       :
        t   f

First you need to not evaluate the trees before the switch, and most places you change something like

fLeft + fRight;

into

EvaluateTree(pNode->pLeft) + EvaluateTree(pNode->pRight);

With + replaced by all the various operators.

For ?: you do ....

case ':': return 0.0f; /* this is an error in the parse tree */
case '?': if (!(pNode && pNode->pLeft && pNode->pRight &&
                pNode->pRight->pLeft && pNode->pRight->pRight))
             /* another error in the parse tree */
             return 0.0f;
          return EvaluateBool(pNode->pLeft) ?
                   EvaluateTree(pNode->pRight->pLeft) :
                   EvaluateTree(pNode->pRight->pRight) ;

For a definition of EvaluateBool you have a couple choices. The C way is more or less

BOOL EvaluateBool(Node* pNode)
{
    return (EvaluateTree(pNode) == 0.0) ? FALSE : TRUE;
}

Then you need definitions for '<' and friends that return 0.0 for false, and anything else for true. The value -1 is a very popular true value, though generally for storing bools in ints.

The more structured way is to move all the operators like '<' that return booleans into the body of EvaluateBool, and make it work more-or-less like EvaluateTree does.

Finally, instead of making the ternary operator ?: use two nodes, you could also change the definition of the node (and the parser) to have up to three sub trees, then most operators would have two trees, but ?: would have three. Maybe something like

case '?': return EvaluateBool(pNode->pLeft) ?
                   EvaluateTree(pNode->pMiddle) : 
                   EvaluateTree(pNode->pRight) ;

But then you'll have to rewrite your pre-order, in-order, post-order tree traversals.

Second part, functions. One way you could do it is store the name of the function in szValue. Another is have a bunch of different values for nType depending on the function. You will have to pick some rule in the parser, and use it here in the interpreter. You could do something like...

else if(nType == TOKEN_CALL)
    return EvaluateFunc(pNode);

Then EvaluateFunc could look something like

number_t EvaluateFunc(Node* pNode)
{
    if ((pNode == NULL) || (pNode->szValue == NULL))
        return 0.0f;
    if (0 == strcmp('cos', pNode->szValue))
        return my_cos(EvaluateTree(pNode->pLeft));
    else if (0 == strcmp('gcd', pNode->szValue))
        return my_gcd(EvaluateTree(pNode->pLeft),
                      EvaluateTree(pNode->pRight));
    /* etc */
    else /* unknown function */ return 0.0f;
}

Looks like a fun project, enjoy!

jsl4tv
Necrolis
And a bit farther down that road and you are in the land of OO design. Each kind of operator would be a different kind of tree node, most taking two children, but all handling evaluate, pre-order, post-order, in-order type methods. There is no rule against doing OO in C or even Fortran for that matter. Enjoy!
jsl4tv
well my inspired idea didn't get too far, however, I your idea of using left|middle|right nodes is working great for function parsing(use use each middle to hold an expression tree for each arg), and same goes for using it for boolean evaluation. just got some final tweaks to do then its off to making this in a nice OO manner :)
Necrolis
+1  A: 

I think you should change your "Node" struct to have an array of children, instead of "pLeft" and "pRight". A function like sin() has one argument/child. The conditional (ternary) operator has three arguments/children.

Tom Sirgedas