views:

137

answers:

2

Hi

I have a simple LALR(1) grammar, but I'm encountering a problem.

start ::= spec.
spec ::= MOD STRING top_stmt.
spec ::= top_stmt.
top_stmt ::= stmt.
top_stmt ::= conditional.
stmt ::= expr.
stmt ::= assignment.
conditional ::= IF stmt_list.
expr ::= retval.
expr ::= NOT retval.
retval ::= access.
retval ::= invoke.
access ::= ns_identifier OBJECT_OPERATOR property_chain.
access ::= ns_identifier.
ns_identifier ::= identifier.
ns_identifier ::= ns_identifier NS_SEPARATOR identifier.
ns_identifier ::=.
property_chain ::= property_chain OBJECT_OPERATOR identifier.
property_chain ::= identifier.
identifier ::= VARIABLE.
identifier ::= STRING.
assignment ::= access ASSIGN expr. [ASSIGN]
stmt_list ::= stmt.
stmt_list ::= stmt_list COMMA stmt. [COMMA]
invoke ::= access LPAREN empty_stmt_list RPAREN.
empty_stmt_list ::=.
empty_stmt_list ::= stmt.
empty_stmt_list ::= empty_stmt_list COMMA stmt. [COMMA]

The dot marks the end of the rule, terminals between brackets have associativity assigned to them: ASSIGN is right-associative, COMMA is left-assoc.

But lemon says it can't reduce the rule "empty_stmt_list ::=." because it's not connected to the start symbol. I bet it is :-)

There is also a parsing conflict for "invoke", it cannot decide between RPAREN and COMMA when empty_stmt_list is indeed an empy list of statements.

What I'm trying to achieve is being able to parse function calls which have no (void) parameters.

Everything else works as expected.

Thanks

Edit: I have edited my original post and posted the entire stripped down grammar.

A: 

You may try adding precedence rules for LPAREN/COMMA and check if it affects the semantics in other places.

artificialidiot
+1  A: 

Your first problem is that I don't think this bit does what you want it to:

invoke ::= access LPAREN empty_stmt_list RPAREN.
empty_stmt_list ::=.
empty_stmt_list ::= stmt.
empty_stmt_list ::= empty_stmt_list COMMA stmt. [COMMA]

The invoke production will match access LPAREN COMMA stmt RPAREN, which I assume is not desirable (and is where the LPAREN/COMMA conflict is coming from).

You can fix that by doing it like like this, which makes use of your existing stmt_list rule:

invoke ::= access LPAREN maybe_empty_stmt_list RPAREN.
maybe_empty_stmt_list ::= .
maybe_empty_stmt_list ::= stmt_list.

That still reports a conflict (but only 1 now) and still complains that maybe_empty_stmt_list ::=. can't be reduced. So, looking in the XXX.out file to see what it is:

State 2:
...
     (16) ns_identifier ::= *
...
     (25) maybe_empty_stmt_list ::= *
...
                        RPAREN reduce 25  ** Parsing conflict **
....
                     {default} reduce 16

...it appears that the problem is with the ns_identifier ::=. rule. Working back through the productions involved, it's not too difficult to see that an empty ns_identifier can be reduced to stmt_list (via ns_identifier -> access -> retval -> expr -> stmt -> stmt_list).

That explains the conflict; and fact that the ns_identifier ::=. rule is favoured in this case because it appears earlier in the grammar (see the rules for resolving reduce-reduce conflicts in the documentation explains why it complains that the maybe_empty_stmt_list ::=. rule can never be reduced.

Matthew Slattery
Thanks a bunch, I fixed it altogether, including removing the empty ns_identfier rule which was actually an artifact to solve another older problem. Notes to self: designing language parsers needs care and the "do it right" mentality. Accepted answer.
Flavius