views:

693

answers:

7

I have an idea for a simple program to make that will help me with operator precedence in languages like C. The most difficult part of this is parenthesizing the expression. For example, I want this:

*a.x++ = *b.x++

Converted to this:

((*(((a).(x))++)) = (*(((b).(x))++)))

Which I did manually in these steps:

           *a.x++ = *b.x++
       *(a).(x)++ = *(b).(x)++
     *((a).(x))++ = *((b).(x))++
   *(((a).(x))++) = *(((b).(x))++)
 (*(((a).(x))++)) = (*(((b).(x))++))
((*(((a).(x))++)) = (*(((b).(x))++)))

What is the best way to accomplish this programmatically? Is there already a solution out there that I could use? I'd prefer to do this in either PHP, C, C++, Python, or Ruby.

(This isn't the whole idea of my program, it is only the first step.)

+6  A: 

You're going to need a parser of some sort that understands operator precendence. The usual version for C is Lexx/Yacc or flex/bison, and the easiest way to do it is construct a parse tree. Once you've done that, just walk the parse tree in the "preorder" order and emit parens as you enter and leave a node.

Charlie Martin
+4  A: 

The most reliable way will be to parse the expression (taking into account precedence rules, of course) and then process the resulting AST (Abstract Syntax Tree) in a top-down manner, adding parenthesis as you move along

Anton Gogolev
+1  A: 

You could create a binary expression tree from the operators.

I believe there are several algorithms available online to create such tree already.

One simple way I could think of, is by sorting the operator by precedence, and then split the string into 2 parts by the operator with the lowest precedence first, then continue recursively splitting the other 2 parts down over and over and then eventually, you'll have the expression in binary tree form.

And then when you have the expression in binary tree form, you can then "parenthesize" up from the tree's leaves up to the root.

And of course, you could compile a full-fledged parser via yacc/bison.

chakrit
+2  A: 

How about converting to postfix and evaluating. Can you try if the following approach works. Lets take *a.x++

Operator    Precedence  Arguments Needed
.           3               2
++          2               1
*           1               1

So now convert the expression to postfix notation. This should give you

a x . ++ *

Now evaluation of postfix is as simple as pushing things into a stack, when you hit an operator, pop the top n (as needed by operator) elements and pass them as arguments, store results back onto the stack. In your case, instead of evaluating, you'd return a textual representation of the operation

     Stack
Read a   a
Read x   x
Read .   (a.x)
Read ++  ((a.x)++)
Read *   (*((a.x)++))

if it helps, you may want to look at:
http://www.spsu.edu/cs/faculty/bbrown/web_lectures/postfix/
Bart de smet's DynCalc series of posts
My attempt at TDDing a similar solution

Gishu
+1  A: 

As a simple example:

Exp = Term | Exp, AddOp, Term 
Term = Factor | Term, MulOp, Factor
Factor = Number | Ident | PreOp, Factor | (, Exp, ) | Factor, PostOp

You can uses the grammar to write the translations:

Exp    = Term                    -> Term
       | Exp, AddOp, Term        -> (, Exp, AddOp, Term, )
Term   = Factor                  -> Factor
       | Term, MulOp, Factor     -> (, Term, MulOp, Factor, )
Factor = Number                  -> Number
       | Ident                   -> Ident
       | PreOp, Factor           -> (, PreOp, Factor, )
       | (, Exp, )               -> (, Exp, )
       | Factor, PostOp          -> (, Factor, PostOp, )

In which case:

a-- + b * (a+b)

Translates to:

((a--) + (b * ((a+b))))
Gamecat
+1  A: 

Parsing is a huge topic. Since you just want to use it to solve a specific problem, try not to get immersed in all these specific parsing algorithms that people are suggesting. Rather, there are numerous parser generators, such as antler or bison which, given an appropriate grammar, will parse text and allow you to perform programmatic operations on the components -- such as put parentheses around them. Some of these systems come with grammars for C, or have such grammars available.

antlr can generate parsers in any of the languages you mentioned; see http://www.antlr.org/

+1  A: 

Just pick up a parser for your selected language, for instance C parser, parse the expression/source code and print the AST back in the way you want.

test.c:

void main(void){
    int c = 2;
}

terminal:

$ python
>>> import pycparser
>>> test = pycparser.parse_file('test.c')
>>> test.show()
FileAST: 
  FuncDef: 
    Decl: main, [], []
      FuncDecl: 
        ParamList: 
          Typename: []
            TypeDecl: None, []
              IdentifierType: ['void']
        TypeDecl: main, []
          IdentifierType: ['void']
    Compound: 
      Decl: c, [], []
        TypeDecl: c, []
          IdentifierType: ['int']
        Constant: int, 2
>>> for node in test.ext:
...     print node
...
<pycparser.c_ast.FuncDef object at 0x7fe1436db750>
>>>
Cheery