LR parsers can't handle ambiguous grammar rules, by design. (Made the theory easier back in the 1970s when the ideas were being worked out).
C and C++ both allow the following statement:
x * y ;
It has two different parses:
- It can be the declaration of y, as pointer to type x
- It can be a multiply of x and y, throwing away the answer.
Now, you might think the latter is stupid and should be ignored.
Most would agree with you; however, there are cases where it might
have a side effect (e.g., if multiply is overloaded). but that isn't the point.
The point is there are two different parses.
The compiler must accept the appropriate one under the appropriate circumstances, and in the absence of any other information (e.g., knowledge of the type of x) must collect both in order to decide later what to do. Thus a grammar must allow this. And that makes the grammer ambiguous.
Thus LR can't handle this.
There are lots of more complicated cases, but only takes one to shoot down pure LR parsing.
Most real C/C++ parsers handle this by using some
kind of deterministic parser intertwined with symbol table
collection... so that by the time "x" is encountered,
the parser knows if x is a type or not, and can thus
choose between the two potential parses. But a parser
that does this isn't context free, and LR parsers
(the pure ones) are context free.
One can cheat, and add checks in the reduction proposal
to LR parsers to do this disambiguation.
And if you cheat enough, you can make LR parsers work for
C and C++. The GCC guys did for awhile, but gave it
up for hand-coded parsing, I think because they wanted
better error diagnostics.
There's another approach, though, which is nice and clean
and parses C and C++ just fine without any symbol table
hackery: GLR parsers.
These are full context free parsers (having effectively infinite
lookahead). GLR parsers simply accept both parses,
producing a "tree" (actually a directed acyclic graph that is mostly tree like)
that represents the ambiguous parse.
A post-parsing pass can resolve the ambiguities.
We use this technique in the C and C++ front ends for the
DMS Software Reengineering Tookit.
They have been used to process millions of lines
of large C and C++ systems, as well as dozens of other languages.