views:

43

answers:

1

I'm working on a project for school with converting a BNF form Decaf spec into a context-free grammar and building it in ANTLR. I've been working on it for a few weeks and been going to the professor when I've become stuck, but I finally ran into something that he says should not be causing an error. Here's the isolated part of my grammar, expr is the starting point. Before I do that I have one question.

Does it matter if my lexer rules appear before my parser rules in my grammar, or if they're mixed in intermittently through my grammar file?

calloutarg:         expr | STRING;
expr:  multexpr ((PLUS|MINUS) multexpr)* ;
multexpr : atom ((MULT|DIVISION) atom)*
;

atom : OPENPAR expr CLOSEPAR | ID ((OPENBRACKET expr CLOSEBRACKET)? | OPENPAR ((expr (COMMA)* )+)? CLOSEPAR)|
CALLOUT OPENPAR STRING (COMMA (calloutarg)+ COMMA)? CLOSEPAR | constant;
constant: INT | CHAR | boolconstant;
boolconstant: TRUE|FALSE;

The ugly formatting is because part of his advice for debugging was to take individual rules and break them down where the ambiguity is to see where the errors are starting. In this case, it's saying the problem is in the long ID portion, that OPENBRACKET and OPENPAR are the cause. If you have any ideas at all, I am deeply appreciative. Thank you, and sorry for how nasty the formatting is on the code I posted.

+1  A: 

Does it matter if my lexer rules appear before my parser rules in my grammar ...

No, that does not matter.

The problem is that inside your atom rule, ANTLR cannot make a choice between these three variants:

  1. ID ( ...
  2. ID [ ...
  3. ID

without resorting to (possibly) backtracking. You could resolve it by using some syntactic predicates (which looks like: (...)=> ...). A syntactic predicates is nothing more than a "look ahead" and if this "look ahead" is successful, it chooses that particular path.

Your current atom rule can be rewritten as follows:

atom 
  :  OPENPAR expr CLOSEPAR
  |  ID OPENPAR ((expr (COMMA)* )+)? CLOSEPAR 
  |  ID OPENBRACKET expr CLOSEBRACKET
  |  ID
  |  CALLOUT OPENPAR STRING (COMMA (calloutarg)+ COMMA)? CLOSEPAR
  |  constant
  ;

And with the predicates it will look like:

atom 
  :  OPENPAR expr CLOSEPAR
  |  (ID OPENPAR)=>     ID OPENPAR ((expr (COMMA)* )+)? CLOSEPAR 
  |  (ID OPENBRACKET)=> ID OPENBRACKET expr CLOSEBRACKET
  |  ID
  |  CALLOUT OPENPAR STRING (COMMA (calloutarg)+ COMMA)? CLOSEPAR
  |  constant
  ;

which should do the trick.

Note: do not use ANTLRWorks to generate or test the parser! It cannot handle predicates (well). Best do it on the command line.

Also see: https://wincent.com/wiki/ANTLR_predicates


EDIT

Let's label the six different "branches" from your atom rule from A to F:

atom                                                            // branch
  :  OPENPAR expr CLOSEPAR                                      //   A
  |  ID OPENBRACKET expr CLOSEBRACKET                           //   B
  |  ID OPENPAR ((expr COMMA*)+)? CLOSEPAR                      //   C
  |  ID                                                         //   D
  |  CALLOUT OPENPAR STRING (COMMA calloutarg+ COMMA)? CLOSEPAR //   E
  |  constant                                                   //   F
  ;

Now, when the (future) parser should handle input like this:

ID OPENPAR expr CLOSEPAR

ANTLR does not know how the parser should handle it. It could be parsed in two different ways:

  1. branch D followed by branch A
  2. branch C

Which is the source of the ambiguity ANTLR is complaining about. If you were to comment out one of the branches A, C or D, the error would disappear.

Hope that helps.

Bart Kiers
Thanks for the help! One question I did have, why wouldn't my left-factoring take care of the ID | ID [ | ID ( problem?
Nick
@Nick, see the **EDIT** to my answer.
Bart Kiers