views:

285

answers:

6

What's the best way to create a parser in C++ from a file with grammar?

+13  A: 

You also might want to have a look at these links:

Samuel_xL
I second that. The Boost documentation is really helpful.
anno
I would suggest not using `boost::spirit` if you plan on a compiler of any decent size - compile times for parsers built with `boost::spirit` tend to get very large, making even very small changes a PITA (because the whole thing is done with templates)
a_m0d
+7  A: 

There are flex and bison. Lex&Yacc cousins that do take c++ existence into account.

Michael Krelin - hacker
+2  A: 

Have you looked at Lex and Yacc ? To quote from section 5 of the linked document:

My preferred way to make a C++ parser is to have Lex generate a plain C file, and to let YACC generate C++ code. When you then link your application, you may run into some problems because the C++ code by default won't be able to find C functions, unless you've told it that those functions are extern "C".

Brian Agnew
Lex and Yacc have been superceeded by Flex and Bison.
Martin York
+1  A: 

I realize that this doesn't really answer your question, but the best way to create a parser is to use lex and yacc.

Dima
Michael Krelin - hacker
I was assuming that the question was about how to write a parser by hand in C++.
Dima
@dima me to. 15 chars
Samuel_xL
Oh... That never occured to me. Not sure it's exactly what OP meant.
Michael Krelin - hacker
+1  A: 

I've used bison, found the examples just right for my level. Was able to create a simple calculator with it, of course it can do much more.

The calculator took 1+2*3 for example and built a syntax tree. The documentation did not describe how to build the tree however and that took me a little time to work out.

If I was going again I'd look into 'antlr' as it looked good and well supported.

Martin.

martsbradley
+7  A: 

It depends heavily on the grammar. I tend to like recursive descent parsers, which are normally written by hand (though it's possible to generate one from a description of the grammar).

If you're going to use a parser generator, there are really two good choices: Byacc and Antlr. If you want something that's (reasonably) compatible with yacc, Byacc is (by far) your best choice. If you're starting from the beginning, with neither existing code nor experience that favors using something compatible with yacc, then Antlr is almost certainly your best bet.

Since it's been mentioned, I'll also talk a bit about Bison. I'd avoid Bison like the plague that it is. Brooks's advice to "Plan to throw one away" applies here. Robert Corbett (the author of Byacc) wrote Bison as his first attempt at a parser generator. Unfortunately, he gave it to GNU instead of throwing it away. In a classic case of marketing beating technical excellence, Bison is widely used (and even recommended, by those who don't know better) while Byacc remains relatively obscure.

Edit: I hate to do it, but since it's also been mentioned, I'll also comment on Boost.spirit. While this may be the coolest example of template meta programming around, it has a couple of problems that lead me to recommend against trying to put it to serious use.

  1. Compile times with it can get excruciating -- 10 minutes is common, and a larger/more complex grammar can take even longer (assuming it doesn't crash the compiler).
  2. If you make any mistake at all, it can and frequently will produce insanely long error messages that are virtually impossible to decipher. Error messages from template-heavy code are notoriously bad anyway, and Spirit stresses the system more than almost anything else.

Believe me: the fact that you can write something like Spirit at all is right on the border between impressive and amazing -- but I'd still only use it if I was sure the grammar I was dealing with was (and would always remain) quite small and simple.

Jerry Coffin