views:

1305

answers:

15

I am teaching (with others) a relatively introductory course in computer science for IT professionals without a background in CS. Since I developed the course materials on automata and grammars, I am also responsible for teaching about compilers and compiler construction.

Years ago, when I studied compilation in college, all our examples came from Lex and Yacc. Are these still in widespread use? Is there something that is more commonly used for Java? The students are proficient in C and Java but have never used parser generators.

Any tips on what to teach would be appreciated

+23  A: 

Antlr is widely used, well documented, and free. It is supported by Ant, and can target Java among many other languages.

Doug Currie
That's right. If they already know java, antlr is the way to go.
tunaranch
+1  A: 

I remember using CUP and liking it. Take a look at the CUP Parser Generator for Java.

CUP is maintained at the Technical University of Munich. I believe it's primary purpose is to teach students.

It also has a free licensing model.

...Permission to use, copy, modify, and distribute this software and its documentation for any purpose and without fee is hereby granted, provided that the above copyright notice appear in all copies and that both the copyright notice and this permission notice and warranty disclaimer appear in supporting documentation...

Brian R. Bondy
+5  A: 

Lex and Yacc are still in use. One of the newest languages around, F#, has it's own versions (fslex, fsyacc -- see here for an example.) So I think teaching them is still relevant.

OJ
+1  A: 

You could skip the generator part and have a look at Scalas parser combinators.

John Nilsson
+3  A: 

PEG parser systems like RATS are simpler than the lex/yacc combo. This may or may not be a plus for your class: is your goal to teach about regular expressions and finite automata, and LR grammars and pushdown automata, etc.? Or do you want the simplest practical compiler frontend tools?

(Since I don't program in Java these days I haven't tried RATS in particular.)

Darius Bacon
A: 

I am currently taking a compilers course which uses Lex and Yacc. I don't really know about any other types out there, but the theory we're learning seems to map pretty well to these tools.

Ryan Fox
+9  A: 

It's a pity your students aren't well-versed in C++. Once I came across the Spirit library with its concept of a rich, EBNF-style DSL, I've left Antlr, Lex and Yacc behind! It's much more flexible having the grammar described alongside the code.

Brilliant library, though with an admittedly non-trivial learning curve.

However, without C++, Antlr is probably your best bet.

MattyT
I even used spirit to parse command line arguments ... :)
xtofl
Don't like the program_options library, eh? :)
MattyT
in our university (i'm still studying), we use Boost::Spririt as basis for teaching both parsing and advanced C++.It works really well for those who are motivated to learn, but of course, seems far too indepth for cruisy students
Alex Lim
+10  A: 

I don't use lexer and parser generators. They're simple enough to generate by hand, and are the easiest parts of a compiler to write. Besides, when you build them by hand, you can make them really fast.

Walter Bright
You apparently have never written complicated parser/lexers... you really don't want to maintain those by hand if they get complicated.
Jorn
Yes, I've done parsers and lexers by hand that do arbitrary lookahead and backtracking, and even one for C++! (The C++ is a bit harder because of its interactions with the preprocessor, tokens that exist or not based on compiler switches, etc.)
Walter Bright
+2  A: 

Javacc it's very easy.
In the same file you have the grammar and the token list.

https://javacc.dev.java.net/

damian
A: 

I remember using Bison in one of my compilers classes. We also used flex and YACC.

Adam Pope
+1  A: 

If you plan to work with Java, JavaCC or ANTLR should suffice. This latter one also support C and Python. But if you plan to work with C++, maybe you should take a look at Boost::Spirit.

Alexandre Hamez
+1  A: 

Haven't tried it yet, but I found jparsec a few days ago. It is no parser generator, instead the parser is build in java by combinators in an EBNF style.

Johannes B
+4  A: 

Yacc and all the other LALR(1) parsers date from an era when machine resources were scarce and it was necessary to spend a lot of time engineering the grammar so that you could run a parser at all on a PDP-11 with 64K of RAM. Today it does not make sense to teach a tool like yacc with a terrible human interface and a very limited set of grammars it can use.

I would recommend either one of the PEG-based parsers, such as Rats!, or the GLR parser Elkhound developed by George Necula and Scott McPeak (thanks quark). Sorry I can't recommend a specific tool for Java, but Rats! is good for C.

ANTLR is OK but is too complex for my taste.

Norman Ramsey
Open source GLR parser: Elkhound. http://www.scottmcpeak.com/elkhound. It's even been used to generate a C++ parser.
quark
A: 

OCaml has a fantastic set of parser generators. Here are some simple examples.

JavaCC is also quite good.

I would strongly recommend avoiding C (and C++) for this purpose because they are extraordinary painful in this context.

Jon Harrop
+1  A: 

I like the GOLD Parsing System very much, because it basically generates the tables needed and you then only have to use a (generic) implementation of a processor which uses the table information to process the tokens. This engine (as it is called) is quite easy to write and is basically a pure implementation using the LALR and DFA tables to process the input, and writing such an implementation may be a good exercise to teach those.

Lucero
Why the downvote? Anyone?
Lucero