views:

290

answers:

10

PS.Where to read about parsing theory?

+1  A: 

YACC, there are various implementation for different languages.

Good luck with your language ;-)

klez
+6  A: 

It's always a good idea to read the Dragon Book. But be aware that if your language is not trivial, there's not really a "short" way to do it.

fvu
+3  A: 

It rather depends on your language. Some very simple languages take very little parsing so can be hand-coded; other languages use PEG generators such as Rats! ( PEG is parser expression grammar, which sits between a Regex and a LR parser ) or conventional parser generators such as Antlr and Yacc. Less formal languages require probabilistic techniques such as link grammars.

Pete Kirkham
+1  A: 

I used the GOLD Parsing System, because it seemed easier to use than ANTLR for a novice like me, while still being sufficiently-fully-featured for my needs. The web site includes documentation (including an instructions on Writing Grammars, which is half the work) as well as software.

ChrisW
+1  A: 

Try Bison for parsing and Flex for lexing

The bison definition of your language is in the form of a context-free grammar. The wikipedia artcile on this topic is quite good, and is probably a good place to start.

Joel
+2  A: 

Write a Recursive Descent Parser. This is sometimes easier than YACC/BISON, and usually more intuitive.

Yuval F
+4  A: 

Summary: the shortest is probably Antlr.

Its tempting to go to the Dragon Book to learn about parsing theory. But I don't think the Dragon Book and you have the same idea of what "theory" means. The Dragon Book describes how to built hand-written parsers, parser generators, etc, but you almost certainly want to use a parser-generation tool instead.

A few people have suggested Bison and Flex (or their older versions Yacc and Lex). Those are the old stalwarts, but they are not very usable tools. Their documentation is not poor per se, its just that it doesn't quite help in getting dealing with the accidental complexity of using them. Their internal data is not well encapsulated, and it is very hard to do anything advanced with them. As an example, in phc we still do not have correct line numbers because it is very difficult. They got better when we modified out grammar to include No-op statements, but that is an incredible hack which should not be necessary.

Ostensibly, Bison and Flex work together, but the interface is awkward. Worse, there are many versions of each, which only play nicely with some specific versions of the other. And, last I checked at least, the documentation of which versions went with which was pretty poor.

Writing a recursive descent parser is straightforward, but can be tedious. Antlr can do that for you, and it seems to be a pretty good toolset, with the benefit that what you learn on this project can be applied to lots of other languages and platforms (Antlr is very portable). There are also lots of existing grammars to learn from.

Its not clear what language you're working in, but some languages have excellent parsing frameworks. In particular, the Haskell Parsec Library seems very elegant. If you use C++ you might be tempted to use Spirit. I found it very easy to get started with, and difficult--but still possible--to do advanced things with it. This matches my experience of C++ in general. I say I found it easy to start, but then I had already written a couple of parsers, and studied parsing in compiler class.

Long story short: Antlr, unless you've a very good reason.

Paul Biggar
I don't agree with you. Bison and Flex have good documentaton.
Kinopiko
@Kinopiko: fair enough. I guess that's not exactly what I meant. Hope its better/fairer now.
Paul Biggar
+1  A: 

Douglas Crockford has an approachable example of a parser written in JavaScript.

Steven Huwig
A: 

Using a parser generator for your host language is the fastest way, combined with parsing theory from a book such as the Dragon Book or the Modern Compiler Construction in {C,ML} series.

If you use C, yacc and the GNU version bison are the standard generators. Antlr is widely used in many languages, supporting Java, C#, and C++ as far as I know. There are also many others in almost any language.

My personal favorite at present is Menhir, an excellent parser generator for OCaml. ML-style languages (Ocaml, Standard ML, etc.) dialects in general are very good for building compilers and interpreters.

Michael E
A: 

ANTLR is the easiest for someone without compiler theory background because of:

  • ANTLRWORKS (visual parsing and AST debugging)

  • The ANTLR book (no compiler theory background required)

  • Just 1 syntax for lexer and parser.

ktulur