views:

256

answers:

7

A few semesters back I had a class where we wrote a very rudimentary scheme parser and eventually an interpreter. After the class, I converted my parser into a C++ parser that did a reasonably good job of parsing C++ as long as I didn't do anything fancy with the preprocessor or macros. I could use it to read over my classes and functions and do neat things like automatically generate class readers or writers or set up function callbacks from a text file.

However, my program is pretty limited. I'm sure I could spend some time to make it more robust and do more neat things, but I don't want to spend the time and effort if there are already more robust tools available that do the same thing. I figure there has to be something like this out there since parsers are an essential part of compilers, but I haven't seen tools specifically for automatic code generation that make it easy to go through and play with data structures that represent classes, functions and variables for C++ specifically. Are there tools that do this?

Edit:

Hopefully this will clarify a little bit of what I'm looking for. The program I have runs as a prebuild step in visual studio. It reads over my source files, makes a list of classes, their members, their functions, etc. which is then used to generate new code. Currently I just use it to make it easy to read and write my data structures to a plain text file, but I could do other things as well. The file readers and writers are output into plain .cpp and .h files which I include in the rest of my project just as I would any other file. What I'm looking for are tools that do similar things so I can decide if I should continue to use my own or switch to a some better solution. I'm not looking for anything that generates machine code or edits code that I've written.

+1  A: 

Maybe Boost::Serialize or ANTLR?

Johnicholas
+2  A: 

The C++ FAQ Lite has references to YACC grammars for C++. YACC is an old-school parser that was used to generate parser output, clumsy and difficult to learn but very powerful. Nowadays, you'd use Gnu Bison instead of YACC.

David Thornley
The GNU guys gave up using Bison to parse C and C++.
Ira Baxter
Probably a good idea. Exactly why C syntax is the way it is I may never know.
David Thornley
+6  A: 

A complete parser-building tool like ANTLR or YACC is necessary if you want to parse C++ from scratch, but it's overkill for your purposes.

It reads over my source files, makes a list of classes, their members, their functions, etc. which is then used to generate new code.

Two main options:

  • GCC-XML can generate a list of classes, members, and functions. The distribution version on their web site is quite old; try the CVS version instead. I don't know about the availability of a Windows port.
  • Doxygen is designed for producing documentation, but it can also produce an XML output, which you should be able to use to do what you want.

Currently I just use it to make it easy to read and write my data structures to a plain text file...

This is known as serialization. Try Boost.Serialization or maybe libs11n or Google Protocol Buffers. Stack Overflow has further discussion.

...but I could do other things as well.

Other cool applications of this kind of automatic code generation include reflection (inspecting your objects' members at runtime, using duck typing with C++, etc.) and generating wrappers for calling C++ from scripting languages. For a C++ reflection library, see Reflex. For an example of generating wrappers for scripting languages, see Boost.Python or SWIG.

Josh Kelley
Thank you. I have been banging my head for 10 minutes trying to remember what GCC-XML was called!
Duck
+2  A: 

Don't forget about Cog. It requires you to know Python. In essence it embeds the output of Python scripts into your code. It's absurdly easy to use, but it takes a totally different approach from things like ANTLR and its purpose is somewhat different.

Brian
A: 

Mozilla developed Pork for this kind of thing. I can't say it's easy to use (or even to build), but it is in production.

Max Lybbert
+1  A: 

I answered a similar question (re splitting source files into separate header and cpp files) by suggesting the use of lzz.

lzz has a very powerful C++ parser that builds a representation for everything except the bodies of functions. As long as you don't need the contents of the function bodies you you could modify 'lzz' so that it performs the generation step you want.

Richard Corden
+1  A: 

If you want tools that can parse production C++ code, and carry out arbitrary analyses and transformations, see the DMS Software Reengineering Toolkit and its C++ front end.

Ira Baxter