tags:

views:

1976

answers:

5

I'm trying to work on a kind of code generator to help unit-testing an legacy C/C++ blended project. I don't find any kind of independent tool can generate stub code from declaration. So I decide to build one, it shouldn't be that hard.

Please, anybody can point me a standard grammar link, better described by yacc language.

Hope I'm not reinventing wheel, please help me out in that case.

Best Regards, Kevin

+14  A: 

From the C++ FAQ Lite:

38.11 Is there a yacc-able C++ grammar?

The primary yacc grammar you'll want is from Ed Willink. Ed believes his grammar is fully compliant with the ISO/ANSI C++ standard, however he doesn't warrant it: "the grammar has not," he says, "been used in anger." You can get the grammar without action routines or the grammar with dummy action routines. You can also get the corresponding lexer. For those who are interested in how he achieves a context-free parser (by pushing all the ambiguities plus a small number of repairs to be done later after parsing is complete), you might want to read chapter 4 of his thesis.

There is also a very old yacc grammar that doesn't support templates, exceptions, nor namespaces; plus it deviates from the core language in some subtle ways. You can get that grammar here or here.

Jared Oberhaus
If you need to really parse C++, you need machinery that really works. "Not used in anger" means it doesn't work for real C++ code. (I don't understand why this answer was favorited/upvoted so many times given how completely ineffective this answer will be).
Ira Baxter
@Ira: My guess as to why it's upvoted is that there really isn't anything better. Parsing C++ is hard.
David Thornley
Ira is right. You will likely just end up wasting your time. I'm all for building your own, and plunging down the rabbit hole, if what you want to do is learn. But if you want to get a job done it is advisable to get something that works out of the box. The DMS tools have other advantages in that it covers a bunch of languages, and has additional features that you may find useful in your project. If your time is worth money (i.e. you are not doing it for fun) then the prices are reasonable.
Andre Artus
+1  A: 

I found this one recently. I haven't tried it out, so am not sure if it works. Could you give more info on the tool you're trying to develop? I downloaded this grammar because I'm working on an instrumentation tool so I can add coverage info for my unit test framework.

Dushara
I'm actually to working on something actually belong to a unit-test framework. To test a single translation unit, external reference need to be provided to produce a runnable binary, so I'm trying to parse the source code to find declarations and generate stub definition.
Kevin Yu
+2  A: 

Jared's link is the closest thing to a context-free grammar you can get. Certain things do need to be delayed for later, but that is by some arguments better than the context-sensitive grammar of C++.

To make things worse, C++1x will complexify the grammar significantly. To get as far as a perfect parse of C++, a parser will need to implement enough of the standard to correctly do overload resolution, including template argument deduction, which in turn will require the concepts mechanism, lambdas, and in effect almost all of the language, except for two-stage name lookup and exception specifications which, if I recall correctly, do not need actual implementation to parse a program successfully.

In effect, you are halfway to a compiler if you can parse C++.

coppro
If you can't do name resolution completely, you are nowhere near a C++ compiler. Parsing is much easier than name resolution.
Ira Baxter
No, because parsing requires name resolution; that's my point. C++'s grammar is that bad.
coppro
C++ parsing does NOT require name resolution if you use a GLR parser. In fact, it is is pretty easy and we do it with our DMS tool every day (www.semanticdesigns.com/Products/FrontEnds/CppFrontEnd.html). If you insist on using an LALR(1) parser that cannot tolerate local ambiguity, *then* you have to name resolve as you parse and I agree that's a mess, but then there's your reason for not doing it that way. Doing name resolution for C++ even with local ambiguities is still pretty hard, I will grant, but not nearly as nasty as when tangled with the parser.
Ira Baxter
... and our C++ front end does all that name resolution, too. You're still nowhere near a C++ compiler: you still need flow analysis, optimizing transforms, low-level code generation, register assignment, optimization, ...
Ira Baxter
A: 

Our DMS Software Reengineering Toolkit can be obtained with a robust, full featured C++ parser. See http://www.semanticdesigns.com/Products/FrontEnds/CppFrontEnd.html This builds ASTs and symbol tables, and can infer the type of any expression. DMS enables one to carry out arbitrary analyses and transformations on the C++ code.

One "simple" transformation is instrumenting the code to collect test coverage data; we offer this as a COTS tool. See this paper to understand how DMS does it: http://www.semanticdesigns.com/Company/Publications/TestCoverage.pdf

Ira Baxter
A: 

For another approach, you could consider piggy-backing on an existing compiler.

GCC-XML will "compile" C++ into XML files with a lot of useful information; it may be enough for your purposes.

Unfortunately, GCC-XML is only 1/4-maintained, and getting it to work can be...interesting. Good luck, if you go this route.

Walter Mundt