Ambiguous grammar, not LALR(1), unparsable by default yacc mode
To make a long story short, you can "fix" this with a %glr-parser
declaration as follows:
%glr-parser
%%
start: rule1_list
. . .
. . .
To make a long story kind of medium-length...
Shift-reduce conflicts are normally not errors. The conflict is resolved by always doing the shift which is usually what you want. Most or all real-world grammars have shift-reduce conflicts. And if you wanted the reduction you can arrange for that with precedence declarations.
However, in a truely ambiguous grammar, doing the shift will send the parser down one of two paths, only one of which will ultimately find a string in the grammar. In this case the S/R conflict is a fatal error.
Analyzing the first one, when the parser sees the newline in the | rule2 NEWLINE rule3_list
case, it can either shift to a new state where it will be expected a rule3_list, or it can reduce a rule1 using rule1: rule2
. It will always look for a rule3_list due to the default choice of shift.
The second conflict occurs when it sees a newline in rule3_list: rule3_list . NEWLINE rule3
. Now it can either shift and start looking for rule3 or reduce a rule1 using | rule2 NEWLINE rule3_list
.
The result is that as written, and assuming '2' and '3' for the terminals, you can only parse a 2 line followed by 3 lines. If you fiddle with the precedence you can only parse '2' lines and never '3' lines.
Finally, I should add that going with a yacc-generated GLR parser is something of a kludge. I imagine it will work just fine but it's pure BFI, the parser splits, keeps two stacks, continues down both paths until one finds a string in the grammar. Sadly, the other fixes are kludges too: 1. reformulate the grammar as LALR(1), 2. add extra lookahead in the scanner and return a composite token, 3. Experiment with the rules for the grammar you have, perhaps yacc can handle a variation.
This is why I don't actually like yacc and prefer hand-written recursive descent or something more modern like PEG. (See Treetop.)
I tried something with (the preferred) left-recursive rules that simply ignored the newlines (which complicate your grammar, making whitespace tokens...) .. and this "works", although I'm not sure if it's what you want...
%%
start: stmtList
;
stmtList: /* nothing */
| stmtList '2' threeList;
;
threeList: /* nothing */
| threeList '3'
;
%%
int yylex() { int c; do { c = getchar (); } while (c == '\n'); return c; }