Every undergraduate Intro to Compilers course reviews the commonly-implemented subsets of context-free grammars: LL(k), SLR(k), LALR(k), LR(k). We are also taught that for any given k, each of those grammars is a subset of the next.
What I've never seen is an explanation of what sorts of programming language syntactic features might require moving to a different language class. There's an obvious practical motivation for GLR parsers, namely, avoiding an unholy commingling of parser and symbol table when parsing C++. But what about the differences between the two "standard" classes, LL and LR?
Two questions:
- What (general) syntactic constructions can be parsed with LR(k) but not LL(k')?
- In what ways, if any, do those constructions manifest as desirable language constructs?
There's a plausible argument for reducing language power by making k as small as possible, because a language requiring many, many tokens of lookahead will be harder for humans to parse, as well as "harder" for machines to parse. Question (2) implicitly asks if the same reasoning ends up holding between classes, as well as within a class.
edit: Here's one example to illustrate the sorts of answers I'm looking for, but for regular languages instead of context-free:
When describing a regular language, one usually gets three operators: +
, *
, and ?
. Now, you can remove +
without reducing the power of the language; instead of writing x+
, you write xx*
, and the effect is the same. But if x
is some big and hairy expression, the two x
s are likely to diverge over time due to human forgetfulness, yielding a syntactically correct regular expression that doesn't match the original author's intent. Thus, even though adding +
doesn't strictly add power, it does make the notation less error-prone.
Are there constructs with similar practical (human?) effects that must be "removed" when switching from LR to LL?