views:

2766

answers:

11

I'm currently shopping for a FOSS parser generator for a project of mine. It has to support either C or C++.

I've looked at bison/flex and at boost::spirit.

I went from writing my own to spirit to bison to spirit to bison to spirit, each time hit by some feature I found unpleasant.

The thing I hate most about bison/flex is that they actually generate C/C++ source for you. There are a number of disadvantages to this, e.g. debugging. I like spirit from this point of view, but I find it very very heavy on syntax.

I am curious about what you are using, what you would recommend, and general thoughts about the state of the art in parser generators. I am also curious to hear about approaches being used in other languages for parsing problems.

+21  A: 

Antlr isn't bad and it has a built in debugger. The package also comes with an API for C (among other available languages).

Kevin Loney
"Not bad" is underestimation, with AntlrWorks the word is "awesome" :)
utku_karatas
+1, Antlr's pretty good :)
orip
+3  A: 

I am curious about what you are using, what you would recommend, and general thoughts about the state of the art in parser generators.

I'm using the GOLD parser at http://www.devincook.com/goldparser/ ... because:

  • I'm not experienced with or formally educated in parsing, and I found it easy to learn and use
  • It says that it supports several languages (including C, C++, and C#).
ChrisW
GOLD is still based on the tired old LALR model, for which it is apain to write the grammar. GOLD's main claim on its web page is thatit is easy to support lots of programming languages. That's a goodthing, but for C and C++, better alternatives are available.
Norman Ramsey
I don't like it. Seems to be windows-only and the documentation seems unclear.
I found I was able to make a CSS grammar for it: which (parsing CSS) is what I wanted a parser for. That was my first and only experience with parsing: so presumably CSS is easy to parse, or the tool is good, and/or I was lucky.
ChrisW
So, that's my experience. Other tools might be better, perhaps even more powerful in some ways, but this one suited me. I knew when I chose it that Antlr was the more famous project.
ChrisW
A: 

Flex has a way to configure it to generate C++ (and perhaps Bison does as well, though I'm unsure of that). I recall trying to use this in the final project for my compilers class and finding it nearly undocumented, so I fell back to using C. That was a year and a half ago, so maybe it's gotten better since then. There's definitely a section in the man page on it though. I'm not sure that's helpful, but at least it's something you can try :)

rmeador
Its not a how too but the man pages for both tools are very thoughra (though can be heavy going).
Martin York
I don't really care if it's C or C++. I want something with nice syntax.
+14  A: 

I use FLEX and Bison.
Both have the ability to generate C++ code (via command line flags or directives in the file).

I hear Antlr is good but have never used it personally.

Martin York
A: 

There are plenty of good documentations on Antlr and it has a very nice eclipse plugin. So I recommend it. But unfortunately have no experiences at other options.

systemsfault
+9  A: 

Please don't use bison/flex or yacc/lex. They parse very efficiently but are really hard on the programmer. Use a more modern parser generator with a better user interface. ANTLR is a good suggestion, and you might also consider

Norman Ramsey
I'm not familiar with the theory behind packrat and the implementations for C and C++ seem limited. I respect Elkhound very much (esp. for Elsa), but it seems to heavyweight for what I need. +1
The main thing is not to limit yourself to LALR grammars. It just makes your grammar less readable and causes you unnecessary suffering. Maybe you'll like ANTLR?
Norman Ramsey
The elkhound parser seems to have moved to http://scottmcpeak.com/elkhound/ if somebody is looking for it now.
dajobe
+14  A: 

I've been very happy using spirit. Yes, the syntax can take some getting used to but it's flexible and powerful.

If your code is in C++ it's the most elegant solution IMHO since a) it integrates beautifully with your code (particularly with the design of actions) and b) you don't need to run a code generator as a separate build step.

I'd suggest looking into it some more before dismissing it.

Antlr is great if you're using other languages, but when I'm using C++ Antlr feels clunky and awkward compared to using spirit. I've drunk the kool-aid; spirit FTW! ;)

MattyT
I have looked into it; currently my project is written using spirit. I'm not happy, particularly with the AST stuff.
Could you kindly elaborate? http://stackoverflow.com/questions/432173/what-are-the-disadvantages-of-the-spirit-parser-generator-framework-from-boost-or
Norman Ramsey
I'd like some further information too...
MattyT
+12  A: 

I'd recommend looking a little at the Lemon parser generator used in SQLite

Lemon

Lemon is an LALR(1) parser generator for C or C++. It does the same job as "bison" and "yacc". But lemon is not another bison or yacc clone. It uses a different grammar syntax which is designed to reduce the number of coding errors. Lemon also uses a more sophisticated parsing engine that is faster than yacc and bison and which is both reentrant and thread-safe. Furthermore, Lemon implements features that can be used to eliminate resource leaks, making is suitable for use in long-running programs such as graphical user interfaces or embedded controllers.

epatel
It seems reasonable, but not impressive. In particular, the documentation could use some work.
It don't look flashy, but coupled with a decent lexer (I suggest ragel), it is the best parser generator I have used. And that documentation is enough because lemon is really simple to use.
artificialidiot
Thank you very much for this suggestion - very nice piece of software!
milan1612
+1  A: 

If you understand the theory of lexing and parsing you can use Flex and Bison to generate the state machine tables for you and implement the lexer and parser yourself (or re-implement the templates that come with Bison and Flex) to get rid of the things you don't like about them.

I've done this at one time, and it's nice in so far as you can have your own lexer and parser written to your specifications, in your application's style, with your own coding standards and debugging features, but you use the well coded algorithms inside Flex and Bison to generate the state transition tables for you. And I'd wager to say that creating the tables is probably the more complicated problem.

So in summary: Use flex and bison to generate your state transition tables, which are then used by your own lexer and parser.

Daemin
I don't understand why I would want to do that. I'm not working on something particularly complicated; if I need to write some part on my own, I might as well go all the way and write it all.
Well you said that you don't like the C code that gets generated. Therefore if you use Flex and Bison to generate the tables for you, not the code, you can write your own manipulations (or code what they have done in the template) and replace it with something that fits into your framework better.
Daemin
A: 

Visual ++ Parser

Vinay
Isn't Visual Parser Java only?
Kevin Loney
It is a C++. I am using it in C++
Vinay
Can you include a link to it (C++ version)?
+1  A: 

A state-of-the-art parser generator is the DMS Software Reengineering Toolkit. (I'm the architect).

It isn't FOSS, but you asked specifically about state-of-the art.

It isn't so much a parser generator, as a complete ecosystem for building tools that process formal documents (programs, specifications, hardware designs, anything that has a "formal syntax/semantics").

DMS provides

  • lexers with full Unicode capability and ability to read a huge variety of input encoding formats (ascii, UTF-8/16, EBCDIC, ...)
  • full-context free parsing (infinite lookahead and built-in error recovery)
  • automatically builds abstract syntax trees, determining which productions are lists. The syntax trees capture comments in the text.
  • provides direct support for building tree-structured analzers called "attribute grammar evaluators"
  • provides symbol table construction support that has been proven to be capable of handling nasty languages such as C++
  • provides pretty printers to regenerate valid source text from the trees, including regenerating valid comments
  • source-to-source rewrite rules to allow you to define program transformations using the syntax of the langauge of interest
  • provides control flow, data flow, call graph, and global points-to analysis machinery
  • has tested front front ends for C, C++, Java, and COBOL, all of which build symbol tables and construct the various flow analyses above
  • has front ends for a variety of other langauges, including C# (4.0), PHP, Ada, ...

One of the tests of fire for a "state of the art" parser generator is its ability to parse C++. DMS parses C++, does all the symbol table construction, etc. and has been used to carry out massive transformations automatically on C++ code.

Other "parser generators" tend to provide at best parsing ability and leave you to build your own trees and all of the rest of the above stuff if you have the heart and the years to do it.

ANTLR is a bit better in that it does provide support for tree building, some syntax-directed pattern matching. The C++-trial-by-fire ANTLR sort of passes; there is a C++ front end for ANTLR. To the best of my knowledge, it is incomplete, doesn't have symbol table support, and I don't know of any uses of it for production tasks.

ELSA succeeds at C++ (and symbol tables) by virtue of being focused on parsing C++. The foundation machinery (Elkhound) behind ELSA is the same GLR parsing algorithms used by DMS. But I don't believe that Elkhound is widely used for anything but to support ELSA.

At the risk of being immodest, I would suggest that DMS defines the state of the art. (I'll agree that ANTLR is pretty good for what it does).

You can get more detailed comparisons of DMS to many other systems here.

Ira Baxter