tags:

views:

30

answers:

1

I am trying to make a parser rule which allows for zero or more of a token before a second rule and for which each successive token - of those which were part of the closure - is, in the AST, a child of the previous token, and the second rule is also a child of the last symbol.

easier to explain by example...

expression11 : ((NOT | COMPLEMENT)^)* expression12;

For example, given the above parser rule, if I have the expression !!x (where x is an ID), I want, in my AST, the x to be the child of the second bang operator which is the child of the first.

Desired:

!
  \ child
    !
      \ child
       x

Instead of my desired behavior, the above line produces an AST for which the second bang operator is a child of the first, but the x is a child of the first bang operator, a sibling of the second one. Obviously not what I want for a unary operator.

Encountered behavior:

        !
child /   \ child
    x -sib- !

If I add a third operator (as in "!!!x") the third one becomes a child of the second, as expected, and x remains a child of the first, sibling of the second.

I thought perhaps I could fix this by surrounding the entire operator part with parenthesis and adding another caret, such as

expression11 : (((NOT | COMPLEMENT)^)*)^ expression12;

in an effort to force expression12 to be a child of the entire closure of operators, hoping in vain that this would be interpreted as "The child of the entire closure means the child of the most-descended," but that was not the case and doing this did not change the behavior at all.

My question is "How do I get the parser to process the rule in such a way that the result of expression12 becomes the child of the most-descended 'NOT' or 'COMPLEMENT' node instead of the highest ancestor one?"

I would have thought this would be simple, but I cannot figure it out from the Antlr resources on antlr.org nor by pleading with Google. It must be done all the time, or is there a different way to structure the rule entirely which I am overlooking?

Here are the following rules for completeness. They are not finished yet and will be modified, but they are complete and working for testing and all is well with them - as expected since they are straightforward. 12 is for array length and method calls, 13 is for new classes and arrays, 14 for array indexing, and 15 for terminals/parenthesis.

expression12 : expression13 (DOT (LENGTH | (ID LPAREN (expression (COMMA expression)*)? RPAREN)))?;
expression13 : expression14 | (NEW^ ((ID LPAREN RPAREN) | (INTTYPE LSQBRACK expression RSQBRACK)));
expression14 : expression15 (LSQBRACK expression RSQBRACK)*;
expression15 : (LPAREN expression RPAREN) | INTLIT | TRUE | FALSE | ID | THIS;

Thank you to anyone who can provide assistance; your time is much appreciated.

+1  A: 

You must not use the Kleene star if you don't want operators to appear as siblings. Try something like (untested)

expression11 : (NOT | COMPLEMENT)^ expression11
             | expression12;
Martin v. Löwis
After using your answer, I finally found the answer also in the antlr mailing list (http://antlr.1301665.n2.nabble.com/A-little-trouble-with-parsing-unary-operators-td5067826.html#a5067826). I thought it had to exist somewhere. Anyway, simple and works great; thank you much.
Loduwijk