views:

1651

answers:

6

I'm having a problem understanding the shift/reduce confict for a grammar that I know has no ambiguity. The case is one of the if else type but it's not the 'dangling else' problem since I have mandatory END clauses delimiting code blocks.

Here is the grammar for gppg (Its a Bison like compiler compiler ... and that was not an echo):

%output=program.cs

%start program

%token FOR
%token END
%token THINGS
%token WHILE
%token SET
%token IF
%token ELSEIF
%token ELSE
%%

program : statements
        ;

statements : /*empty */
           | statements stmt
           ;

stmt : flow
     | THINGS
     ;

flow : '#' IF '(' ')' statements else
     ;

else : '#' END
     | '#' ELSE statements '#' END
     | elseifs
     ;

elseifs : elseifs '#' ELSEIF statements else
        | '#' ELSEIF statements else
        ;

Here is the conflict output:

// Parser Conflict Information for grammar file "program.y"

Shift/Reduce conflict on symbol "'#'", parser will shift
 Reduce 10: else -> elseifs
 Shift "'#'":   State-22 -> State-23
  Items for From-state State 22
    10 else: elseifs .
    -lookahead: '#', THINGS, EOF
    11 elseifs: elseifs . '#' ELSEIF statements else 
  Items for Next-state State 23
    11 elseifs: elseifs '#' . ELSEIF statements else 

// End conflict information for parser

I already switched arround everything, and I do know how to resolve it, but that solution involves giving up the left recursion on 'elseif' for a right recursion.

Ive been through all the scarse documentation I have found on the internet regarding this issue (I post some links at the end) and still have not found an elegant solution. I know about ANTLR and I don't want to consider it right now. Please limit your solution to Yacc/Bison parsers.

I would appreciate elegant solutions, I managed to do It by eleminating the /* empty */ rules and duplication everything that needed an empty list but in the larger grammar Im working on It just ends up like 'sparghetti grammar syndrome'.

Here are some links:

http://nitsan.org/~maratb/cs164/bison.html

http://compilers.iecc.com/comparch/article/98-01-079

GPPG, the parser I'm using

Bison manual

+2  A: 

I think the problem is in the elseifs clause.

elseifs : elseifs '#' ELSEIF statements else
        | '#' ELSEIF statements else
        ;

I think the first version is not required, since the else clause refers back to elseifs anyway:

else : '#' END
     | '#' ELSE statements '#' END
     | elseifs
     ;

What happens if you change elseifs?:

elseifs : '#' ELSEIF statements else
        ;
1800 INFORMATION
You are right. Eleminationg the line resolved the conflict. I reduced my original grammar to this small one to try and pinpoint the problem and ended up introducing a bug. I think I have to redo my original more complex grammar and work it towards this one to work out the solution.
Caerbanog
Darn: just noticed why it worked. It's right side recursion again. The solution I'm after should do left recursion on the elseifs since they can be a long stream.
Caerbanog
A: 

I'm still switching thing arround, and my original question had some errors since the elseifs sequence had an else allways at the end which was wrong. Here is another take at the question, this time I get two shift/reduce conflicts:

flow : '#' IF '(' ')' statements elsebody 
     ;

elsebody : else 
         | elseifs else
         ;

else : '#' ELSE statements '#' END
     | '#' END
     ;

elseifs : /* empty */
        | elseifs '#' ELSEIF statements
        ;

The conflicts now are:

// Parser Conflict Information for grammar file "program.y"

Shift/Reduce conflict on symbol "'#'", parser will shift
 Reduce 12: elseifs -> /* empty */
 Shift "'#'":   State-10 -> State-13
  Items for From-state State 10
    7 flow: '#' IF '(' ')' statements . elsebody 
    4 statements: statements . stmt 
  Items for Next-state State 13
    10 else: '#' . ELSE statements '#' END 
    11 else: '#' . END 
    7 flow: '#' . IF '(' ')' statements elsebody 

Shift/Reduce conflict on symbol "'#'", parser will shift
 Reduce 13: elseifs -> elseifs, '#', ELSEIF, statements
 Shift "'#'":   State-24 -> State-6
  Items for From-state State 24
    13 elseifs: elseifs '#' ELSEIF statements .
    -lookahead: '#'
    4 statements: statements . stmt 
  Items for Next-state State 6
    7 flow: '#' . IF '(' ')' statements elsebody 

// End conflict information for parser

Empty rules just aggravate the gppg i'm affraid. But they seem so natural to use I keep trying them.

I already know right recursion solves the problem as 1800 INFORMATION has said. But I'm looking for a solution with left recursion on the elseifs clause.

Caerbanog
+3  A: 

Your revised ELSEIF rule has no markers for a condition -- it should nominally have '(' and ')' added.

More seriously, you now have a rule for

elsebody : else
         | elseifs else
         ;

and

elseifs : /* Nothing */
        | elseifs ...something... 
        ;

The 'nothing' is not needed; it is implicitly taken care of by the 'elsebody' without the 'elseifs'.

I would be very inclined to use rules 'opt_elseifs', 'opt_else', and 'end':

flow : '#' IF '(' ')' statements opt_elseifs opt_else end
     ;

opt_elseifs : /* Nothing */
            | opt_elseifs '#' ELSIF '(' ')' statements 
            ;

opt_else : /* Nothing */
         | '#' ELSE statements
         ;

end : '#' END
    ;

I've not run this through a parser generator, but I find this relatively easy to understand.

Jonathan Leffler
Yes. I too expected it to work, you are right in stating its the easyiest to understand. but I did ran it on the parser and unfortunatly gave 4 shift/reduce conflicts. Run it through gppg and see for yourself. I dont understand (got to get my 'dragon' compilers manual back)
Caerbanog
The answer works if I take out the '#'. Seems the simbol messes up the lookahead token, probably because the parser is LALR(1)
Caerbanog
A: 
elsebody : elseifs else
         | elseifs
         ;

elseifs : /* empty */
        | elseifs '#' ELSEIF statements
        ;

else : '#' ELSE statements '#' END
     ;

I think this should left recurse and always terminate.

Tried it. Unfortunatly ... 5 shift/reduce conflicts.
Caerbanog
+1  A: 

The answer from Jonathan above seems like it would be the best, but since its not working for you I have a few suggestions you could try that will help you in debugging the error.

Firstly have you considered making the hash/sharp symbol a part of the tokens themselves (i.e. #END, #IF, etc)? So that they get taken out by the lexer, meaning they don't have to be included in the parser.

Secondly I would urge you to rewrite the rules without duplicating any token streams. (Part of the Don't Repeat Yourself principle.) So the rule " '#' ELSEIF statements else " should only exist in one place in that file (not two as you have above).

Lastly I suggest that you look into precedence and associativity of the IF/ELSEIF/ELSE tokens. I know that you should be able to write a parser that doesn't require this but it might be the thing that you need in this case.

Daemin
A: 

OK - here is a grammar (not minimal) for if blocks. I dug it out of some code I have (called adhoc, based on hoc from Kernighan & Plauger's "The UNIX Programming Environment"). This outline grammar compiles with Yacc with no conflicts.

%token  NUMBER IF ELSE
%token  ELIF END
%token  THEN
%start program

%%

program
    :   stmtlist
    ;

stmtlist
    :   /* Nothing */
    |   stmtlist stmt
    ;

stmt
    :   ifstmt
    ;

ifstmt
    :   ifcond endif
    |   ifcond else begin
    |   ifcond eliflist begin
    ;

ifcond
    :   ifstart cond then stmtlist
    ;

ifstart
    :   IF
    ;

cond
    :   '(' expr ')'
    ;

then
    :   /* Nothing */
    |   THEN
    ;

endif
    :   END IF begin
    ;

else
    :   ELSE stmtlist END IF
    ;

eliflist
    :   elifblock
    |   elifcond eliflist begin         /* RIGHT RECURSION */
    ;

elifblock
    :   elifcond else begin
    |   elifcond endif
    ;

elifcond
    :   elif cond then stmtlist end
    ;

elif
    :   ELIF
    ;

begin
    :   /* Nothing */
    ;

end
    :   /* Nothing */
    ;

expr
    :   NUMBER
    ;

%%

I used 'NUMBER' as the dummy element, instead of THINGS, and I used ELIF instead of ELSEIF. It includes a THEN, but that is optional. The 'begin' and 'end' operations were used to grab the program counter in the generated program - and therefore should be removable from this without affecting it.

There was a reason I thought I needed to use right recursion instead of the normal left recursion - but I think it was to do with the code generation strategy I was using, rather than anything else. The question mark in the comment was in the original; I remember not being happy with it. The program as a whole does work - it is a project that's been on the back burner for the last decade or so (hmmm...I did some work at the end of 2004 and beginning of 2005; prior to that, it was 1992 and 1993).

I've not spent the time working out why this compiles conflict-free and what I outlined earlier does not. I hope it helps.

Jonathan Leffler