views:

68

answers:

1

There are lots of different parsing algorithms out there (recursive descent, LL(k), LR(k), LALR, ...). I find a lot of information about the different grammars different types of parser can accept. But how do they differ in runtime behavior? Which algorithm is faster, uses less memory or stack space?

Or to put this differently - which algorithm performs best, assuming the grammar can be formulated to work with any algorithm?

+2  A: 

LR parsers IMHO can be the fastest. Basically they use a token as an index into a lookahead set or a transition table to decide what to do next (push a state index, pop a state indexes/call a reduction routine). Converted to machine code this can just a few machine instructions. Pennello discusses this in detail in his paper:

Thomas J. Pennello: Very fast LR parsing. SIGPLAN Symposium on Compiler Construction 1986: 145-151

LL parsers involve recursive calls, which are a bit slower than just plain table lookups, but they can be pretty fast.

GLR parsers are generalizations of LR parsers, and thus have to be slower than LR parsers. A key observation is that most of the time a GLR parser is acting exactly as an LR parser would, and one can make that part run essentially as the same speed as an LR parser, so they can be fairly fast.

Your parser is likely to spend more time breaking the input stream into tokens, than executing the parsing algorithm, so these differences may not matter a lot.

In terms of getting your grammar into a usable form, the following is the order in which the parsing technologies "make it easy":

 *   GLR   (really easy: if you can write grammmar rules, you can parse)
 *   LR(k) (many grammars fit, extremely few parser generators)
 *   LR(1) (most commonly availalbe [YACC, Bison, Gold, ...]
 *   LL    (usually requires significant reengineeing of grammar to remove left recursions)
Ira Baxter
LL parsers don't require recursion. They can be implemented with tables. Both LR and LL are O(N).
EJP
When algorithms are both O(n), the *constant* factor matters which. Pennello's approach constructively builds LR parsers which literally execute only a few machine instructions per token. Turbo Pascal, IIRC used a recursive descent parser, and was famously fast, although there is no actual analysis of whether it hit a few machine instructions per token. So its really a matter of how hard you push on the engineering for parse speeds. But the key thing to optimize first is token extraction, because it is hard NOT to spend a few machine instructions per character.
Ira Baxter