views:

292

answers:

5

Hello all.

I've got a set of function definitions written in a C-like language with some additional keywords that can be put before some arguments(the same way as "unsigned" or "register", for example) and I need to analyze these lines as well as some function stubs and generate actual C code from them.

  • Is that correct that Flex/Yacc are the most proper way to do it?

  • Will it be slower than writing a Shell or Python script using regexps(which may become big pain, as I suppose, if the number of additional keywords becomes bigger and their effects would be rather different) provided that I have zero experience with analysers/parsers(though I know how LALR does its job)?

  • Are there any good materials on Lex/Yacc that cover similar problems? All papers I could find use the same primitive example of a "toy" calculator.

Any help will be appreciated.

+2  A: 

ANTLR has pretty much superceded Lex\Yacc.

ANTLR, ANother Tool for Language Recognition, is a language tool that provides a framework for constructing recognizers, interpreters, compilers, and translators from grammatical descriptions containing actions in a variety of target languages.

Mitch Wheat
your vision is a bit restrictive i think...
LB
@Mitch Wheat: I believe he is referring to your comment about ANTLR superceding Lex\Yacc.
Jordan S. Jones
+1  A: 

That entirely depends on your definition of "effective". If you have all the time of the world, the fastest parser would be a hand-written pull parser. They take a long time to debug and develop but today, no parser generator beats hand-written code in terms of runtime performance.

If you want something that can parse valid C within a week or so, use a parser generator. The code will be fast enough and most parser generators come with a grammar for C already which you can use as a starting point (avoiding 90% of the common mistakes).

Note that regexps are not suitable for parsing recursive structures. This approach would both be slower than using a generator and more error prone than a hand-written pull parser.

Aaron Digulla
and the last 10% will take you another year because of C's context sensitivity. Ask the GNU guys.
Ira Baxter
+1  A: 

There is also the Lemon Parser, which features a less restrictive grammar. The down side is your married to lemon, re-writing a parser's grammar to something else when you discover some limitation sucks. The up side is its really easy to use .. and self contained. You can drop it in tree and not worry about checking for the presence of others.

SQLite3 uses it, as do several other popular projects. I'm not saying use it because SQLite does, but perhaps give it a try if time permits.

Tim Post
+1  A: 

actually, it depends how complex is your language and whether it's really close to C or not...

Still, you could use lex as a first step even for regular expression ....

I would go for lex + menhir and o'caml....

but any flex/yacc combination would be fine..

The main problem with regular bison (the gnu implementation of yacc) stems from the C typing.. you have to describe your whole tree (and all the manipulation functions)... Using o'caml would be really easier ...

LB
A: 

For what you want to do, our DMS Software Reengineering Toolkit is likely a very effective solution.

DMS is designed specifically to support customer analyzers/code generators of the type you are discussing. It provides very strong facilities for defining arbitrary language parsers/analyzers (tested on 30+ real languages including several complete dialects of C, C++, Java, C#, and COBOL).

DMS automates the construction of ASTs (so you don't have to do anything but get the grammar right to have a usable AST), enables the construction of custom analyses of exactly the pattern-directed inspection you indicated, can construct new C-specific ASTs representing the code you want to generate, and spit them out as compilable C source text. The pre-existing definitions of C for DMS can likely be bent to cover your C-like language.

Ira Baxter
That sounds like a shameless plug/promotion for the product which you have to fork out loads of cash....funny, you have mentioned the exact same thing here... http://stackoverflow.com/questions/526797/good-tools-for-creating-a-c-c-parser-analyzer This is a place for programming questions, not promoting commercial software....
tommieb75
The question was, "What's the best way to do this". Answers should reasonably include software that helps, commercial or not. And yes, I'm biased, since I concieved this answer over 15 years ago to respond to exactly this kind of question.
Ira Baxter