ansaurus

Question

Practical consequences of formal grammar power?

Answer 1

A:

The capabilities of a language are not limited by its syntax and grammar.

It's possible to define any language feature with an LL(k) grammar, it just might not be very readable to humans.

Ben S 2009-12-16 01:58:50

Sure, LL(1) is sufficient for Turing completeness. But I'm specifically interested in syntactic constructions that are made possible by moving to a more powerful formal language class.

Ben Karel 2009-12-16 02:06:36

Answer 2

A:

well, for one, Left recursive definitions are impossible in LL(k) grammars (as far as i know), don't know about others. This doesn't make itimpossible to define other things just a massive pain to do otherwise. For instance, putting together expressions can be easy in a left-recursive language (in pseudocode):

lexer rule expression = other rules
                        | expression
                        | '(' expression ')';

As far as syntactically useful things that can be made with left-recursion, um does simpler grammars count as syntactically useful?

RCIX 2009-12-16 03:48:58

I agree that LL grammars do not express the structure of a language directly as most people probably think about them. (I'd say an even bigger impedance mismatch than left recursion arises from lacking declarative operator precedence, and having to encode operator precedence in the grammar productions.)But that small awkwardness is a property of the grammar, not the language defined by the grammar. Given a language, the "user" of the language doesn't care if the grammar has warts. And in practice, left factoring's not that painful, especially because ANTLRWorks has the refactoring built in.

Ben Karel 2009-12-16 05:27:23

I tried that once, actually, and it just screwed up my grammar (didn't even get rid of the error)

RCIX 2009-12-16 10:07:52

Answer 3

+5 A:

Parsing (I claim) is a bit like sorting: a problem that was the focus of a lot of thought in the early days of CS, leading to a set of well-understood solutions with some nice theoretical results.

My claim is that the picture that we get (or give, for those of us who teach) in a compilers class is, to some degree, a beautiful answer to the wrong question.

To answer your question more directly, an LL(1) grammar can't parse all kinds of things that you might want to parse; the "natural" formulation of an 'if' with an optional 'else', for instance.

But wait! Can't I reformulate my grammar as an LL(1) grammar and then patch up the source tree by walking over it afterward? Sure you can! To some degree, this is what makes the question of what kind of grammar your parser uses largely moot.

Also, back when I was an undergraduate (1990-94), whitespace-sensitive grammars were clearly the work of the Devil; now, Python and Haskell's designs are bringing whitespace-sensitivity back into the light. Also, Packrat parsing says "to heck with your theoretical purity: I'm just going to define a parser as a set of rules, and I don't care what class my grammar belongs to." (paraphrased)

In summary, I would agree with what I believe to be your implied suggestion: in 2009, a clear understanding of the difference between the classes LL(k) and LR(k) is less important in itself than the ability to formulate and debug a grammar that makes your parser generator happy.

John Clements 2009-12-16 22:08:03

Do you have any other specific examples of the shortcomings of LL(k)? An answer of "all kinds of things" merely whets the appetite...

Ben Karel 2009-12-17 00:02:34

Sure. The idea of an LL grammar is that it must decide which form it's parsing based on the first k tokens of the input. So, if I have two rules that start with the same 'k' tokens, an LL(k) parser can't distinguish them. Where is this going to come up in practice? Generally in attempted syntactic shortcuts. For instance: suppose I'm a big fan of C syntax, but I don't like the semicolons (or whitespace sensitivity). I might discover that if all my statements contain exactly one "statement keyword", then I may be able to formulate this as an LR grammar but not as an LL grammar.

John Clements 2009-12-17 01:50:07

Addendum: if all of this sounds a bit grotty, you're right. It is. Additional disclaimer: I'm actually an avid paren-head.

John Clements 2009-12-17 01:51:19

Additional additional comment on your "+" example: the "+" of regular expressions is a horse of a very different color; this is what's called "syntactic sugar", and the key difference is that it doesn't increase expressiveness, because any use of the "+" operator can be replaced *in a purely local way* with an equivalent expression that doesn't use it. cf Felleisen "expressiveness".

John Clements 2009-12-17 01:53:23

Sure, it doesn't increase expressiveness, but it does reduce the chance for programmer error. I agree that desugaring is at best tangentially related to my original question, but then if I could have given a "real" example involving context-free grammars, I wouldn't have asked the original question! ;-)

Ben Karel 2009-12-17 03:51:53

John Clements (or anyone, really): Is there some LR grammar that can't be hacked into an LL grammar that accepts exactly the same texts, albeit perhaps with a wildly different parse tree? (Maybe I'm nit-picking, but by my reading you didn't quite say one way or the other.)

Jason Orendorff 2009-12-17 04:52:55

Answer 4

+1 A:

The difference between LL and LR is primarily in the lookahead mechanism. People generally say that LR parsers carry more "context". To see this practically, consider a recursive grammar definition with S as the starting symbol:

A -> Ax | x 
B -> Ay
C -> Az
S -> B | C

When k is a small fixed value, parsing a string like xxxxxxy is a task better suited to an LR parser. However, these days the popular LL parsers such as ANTLR do not restrict k to such small values and most people no longer care.

I hope this is more or less in line with your question. Of course Knuth showed that any unambiguous context-free language can be recognized by some LR(1) grammar. However, in practice we are also concerned with translation.

As a side note: You might also enjoy reading http://www.antlr.org/article/needlook.html.

This is by no means proven, but I have always questioned whether LR-like parsing is really similar to how the brain works when reading certain notations. For example, when reading an English sentence it is pretty obvious that we read from left-to-right. But, consider the pattern bellow:

. . . . . | . . . . .

I rather expect that with short patterns such as this one people do not literally read "dot dot dot dot dot bar dot dot dot dot dot" from left to right, but rather processes the pattern in parallel or at least in some kind of fuzzy iterative manner. In other words, I do not believe we necessarily read all patterns in a left-to-right manner with the kind of linear lookahead that a LL/LR parser employs.

Furthermore, if we can describe any context-free language using an LR(1) grammar then it is clear that simply recognizing a string is not the same as "understanding" it.

Rehno Lindeque 2009-12-19 10:00:41

I think that saying that either LR or LL is more natural is rather unfounded. Personally, I find LL much more natural, precisely because I 'read from left to right". So it's probably very subjective overall.

Pavel Minaev 2009-12-19 10:13:35

I didn't mean to imply that LR parsing is more natural than LL from a human perspective. For the first part I merely intended to demonstrate a grammar (not language) that cannot be parsed by LL for any fixed k.However, Knuth did actually suggest that LR might be similar to human parsing. Personally I don't really agree: If I really had to speculate I'd say it's some fuzzy hybrid that involves both top-down and bottom-up parsing.

Rehno Lindeque 2009-12-19 11:42:08

ansaurus

tags:

views:

answers:

Practical consequences of formal grammar power?

related questions