tags:

views:

315

answers:

3

I've worked on Flex, Bison few years ago during my undergraduate studies. However, I don't remember much about it now. Recently, I have come to hear about Antlr. Would you recommend that I learn Antlr or better to brush up Flex/Bison. Does Antlr have more/less features than Flex/Bison ?

Thanks,

+3  A: 

ANTLRv3 is LL(k), and can be configured to be LL(*). The latter in particular is ridiculously easy to write parsers in, as you can essentially use EBNF as-is.

Also, ANTLR generates code that is quite a lot like recursive descent parser you'd write from scratch. It's very readable and easy to debug to see why the parse doesn't work, or works wrong.

The advantage of Flex/Bison (or any other LALR parser) is that it's faster.

Pavel Minaev
Faster? Are you comparing LALR to LL(k) or LL(\*)? LL(k) parsing is quite fast. LL(\*) parsing is slow because of backtracking, but you rarely need to use it (and even when you do, I believe ANTLR lets you use LL(\*) for a subset of your grammar). The one advantage LALR has that I know of is that there are certain grammars that can't be parsed LL(k) that can be parsed LALR, but in practice most grammars that you'd actually *want* to parse can be re-expressed in a form that can be parsed LL(k).
Laurence Gonsalves
Yes. LALR is faster because you can use a table-driven finite state machine for that. It's incomprehensible and nigh impossible to debug that once it's produced by a parser generator like Bison, but it's also very fast. LL(k) as implemented by ANTLR uses recursive descent parsing, which is somewhat slower.
Pavel Minaev
I just tried the antlrworks tool and the following thing did not work:expr : expr '+' expr;because it does not allow left recursion. Don't you all find that a bit inconvenient. You have to then convert it to something likeexpr : expr ('+' expr)*i.e. write it like a regular expression. In flex I could write the former way and I found it much more natural than this regex style.
ajay
@ajay, that's the gist of the difference. For some constructs, writing them right-recursive is more natural. For others, writing them left-recursive is more natural. That said, for a simple grammar for something like `1 * (2 + 3) - 4` which your example suggests, you'd rather write something like: `expr: term ('+' | '-') expr; term: factor ('*' | '/') term; factor : literal_number | '(' expr ')'`. Which one is more natural is a matter of debate, but notice how ANTLR doesn't need any special construct for operator associativity - it's inherent in the grammar terms themselves.
Pavel Minaev
A: 

We have decided to use ANTLR for some of our information processing requirements - parsing legacy files and natural language. The learning curve is steep ut we are getting on top of it and I feel that it's a more modern and versatile approach for what we need to do. The disadvantages - since you ask - are mainly the learning curve which seems to be inevitable.

peter.murray.rust
+1  A: 

ANTLR has a run-time library JAR that you must include in your project.

ANTLR's recursive-descent parsers are easier to debug than the "bottom-up" parsers generated by Flex/Bison, but the grammar rules are slightly different.

If you want a Flex/Bison-style (LALR) parser generator for Java, look at JavaCC

Stuart Sierra