views:

501

answers:

3

How do recursive ascent parsers work? I have written a recursive descent parser myself but I don't understand LR parsers all that well. What I found on Wikipedia has only added to my confusion.

Another question is why recursive ascent parsers aren't used more than their table-based counterparts. It seems that recursive ascent parsers have greater performance overall.

+2  A: 

I'm personally having a hard time understanding how a function call can be faster -- much less "significantly faster" than a table lookup. And I suspect that even "significantly faster" is insignificant when compared to everything else that a lexer/parser has to do (primarily reading and tokenizing the file). I looked at the Wikipedia page, but didn't follow the references; did the author actually profile a complete lexer/parser?

More interesting to me is the decline of table-driven parsers with respect to recursive descent. I come from a C background, where yacc (or equivalent) was the parser generator of choice. When I moved to Java, I found one table-driven implementation (JavaCup), and several recursive descent implementations (JavaCC, ANTLR).

I suspect that the answer is similar to the answer of "why Java instead of C": speed of execution isn't as important as speed of development. As noted in the Wikipedia article, table-driven parsers are pretty much impossible to understand from code (back when I was using them, I could follow their actions but would never have been able to reconstruct the grammar from the parser). Recursive descent, by comparison, is very intuitive (which is no doubt why it predates table-driven by some 20 years).

kdgregory
+1 from me - a really nice answer.
duffymo
Recursive descent is slower than table-driven LR parsers. I'm wondering about Recursive ascent which is a different beast altogether.
Daniel W
Yes, per the article recursive ascent has inline function calls to represent the state machine, rather than a variable and tables. And as I said in my first paragraph, I'm not buying that it's faster, particularly within the context of how a parser is used.
kdgregory
But I'm also making a bigger point: that the raw speed of a parser implementation is rarely what matters. Particularly within the context of how a parser is used.
kdgregory
I think the main selling point of method based parsers (esp recursive descent) is debuggability. Walking the calls is much easier than a table. *However*, there are some debuggers I've seen that map shifts/reduces back to the grammar and make it more readable.
Scott Stanchfield
+2  A: 

The clasical dragon book explains very well how LR parsers work. There is also Parsing Techniques. A Practical Guide. where you can read about them, if I remember well. The article in wikipedia (at least the introduction) is not right. They were created by Donald Knuth, and he explains them in his The Art of Computer Programming Volume 5. If you understand spanish, there is a complete list of books here posted by me. Not all that books are in spanish, either.

Before to understand how they work, you must understand a few concepts like first, follows and lookahead. Also, I really recommend you to understand the concepts behind LL (descendant) parsers before trying to understand LR (ascendant) parsers.

There are a family of parsers LR, specially LR(K), SLR(K) and LALR(K), where K is how many lookahead they need to work. Yacc supports LALR(1) parsers, but you can make tweaks, not theory based, to make it works with more powerful kind of grammars.

About performance, it depends on the grammar being analyzed. They execute in linear time, but how many space they need depends on how many states do you build for the final parser.

eKek0
Volume 5 is years away! At best. I hope I can come back to this page in 2020 or whenever and laugh at this comment.
Darius Bacon
A: 

The Wikipedia article on recursive ascent parsing references what appears to be the original paper on the topic ("Very Fast LR Parsing"). Skimming that paper cleared a few things up for me. Things I noticed:

  1. The paper talks about generating assembly code. I wonder if you can do the same things they do if you're generating C or Java code instead; see sections 4 and 5, "Error recovery" and "Stack overflow checking". (I'm not trying to FUD their technique -- it might work out fine -- just saying that it's something you might want to look into before committing.)

  2. They compare their recursive ascent tool to their own table-driven parser. From the description in their results section, it looks like their table-driven parser is "fully interpreted"; it doesn't require any custom generated code. I wonder if there's a middle ground where the overall structure is still table-driven but you generate custom code for certain actions to speed things up.

The paper referenced by the Wikipedia page:

Another paper about using code-generation instead of table-interpretation:

Also, note that recursive-descent parsing is not the fastest way to parse LL-grammar-based languages:

Kannan Goundan
Indeed, recursive-descent parsing is by no means the fastest.
Daniel W