views:

382

answers:

8

Hi,

I want to able programmatically parse and edit C++ source files. I need to be able to change/add code in certain sections of code (i.e. in functions, class blocks, etc). I would also (preferably) be able to get comments as well. Part of what I want to do can be explained by the following piece of code:

CPlusPlusSourceParser cp = new CPlusPlusSourceParser(“x.cpp”);  // Create C++ Source Parser Object
CPlusPlusSourceFunction[] funcs = cp.getFunctions();  // Get all the functions

for (int i = 0; i &lt funcs.length; i++) {  // Loop through all functions
    funcs[i].append(/* … code I want to append …*/);  // Append some code to function 
}
cp.save(); // Save new source
cp.close(); // Close file

How can I do that? I’d like to be able to do this preferably in Java, C++, Perl, Python or C#. However, I am open to other language API’s.

A: 

Very interesting idea. One, less than easy option (just due to the complexity of regex's) would be to utilize regular expressions in any of the languages you suggested. It wouldn't be too difficult to identify what is a function, what is variable definition/declaration, etc. It would take some experience with regexes to get it just right, but I think that would do exactly what you're looking for with minimal code.

Chris Thompson
Oh dear, what I know regexes could be dangerous :)
Kryten
Using regular expressions for parsing non-regular languages (and C++ is *definitely* not regular) is most of the time a very bad idea. It might work for an one-off thing but it's neither a robust nor good solution.
Joey
this particular request can be accomplished using recursive regular expressions available from Perl and PCRE
ZJR
Well, admittedly, INLINE methods would make it an harder feat.Anyway most of it boils down to get the top level "}" characters and insert the extra code there.
ZJR
<<Some people, when confronted with a problem, think“I know, I'll use regular expressions.” Now they have two problems.>>
Mihai Nita
Unfortunately, not even recursive regular expressions can solve this problem. You can't capture the "dependent name" rules in C++. As a result, you cannot determine whether a particular name represents a type, object or function. And because of that, you cannot determine what `token ( )` means.
MSalters
+3  A: 

You can use any parser generator tool to generate a c++ parser for you, but first you have to get the CFG (context free grammar) for C++ , check Antlr

Edit:

Also Antlr supports a lot of target languages

Ahmed Said
While there is an ANTLR C++ parser, I don't know of anyone that has used it for something production. And I don't think it produces a symbol table, without which you really can't do anything with C++.
Ira Baxter
+2  A: 

You need a working grammar and parser for C++ which is, however, not too easy as this can't be constructed with most parser generators out there. But once you have a parser you can actually take the abstract syntax tree of the program and alter it in nearly any way you want.

Joey
After yo uhave an abstract syntax tree, you will need a symbol table so you know the meaning of symbols. And this is extremely hard to get right. ASTs are NOT enough.
Ira Baxter
@Ira: The OP sounded like s?he just wanted to append some code to functions. If that doesn't reference variables it should still be doable. Also it might be that they have a good enough assumption on how the code looks.
Joey
+1  A: 

have a look at the doxygen project, its a open source project, to parse and document several programming languages, C++ included. I believe using this project's lexer will get you more than half the way

Alon
+1  A: 

This is similar to http://stackoverflow.com/questions/239722/ast-from-c-code

If your comfortable with Java antlr can easily parser your code into an abstract syntax tree, and then apply transformation to that tree. A default AST transform is to simply print out the original source.

brianegge
A: 

The Mozilla project has a tool that does this.

Max Lybbert
+1  A: 

In a C# -- or general .net -- approach, you might be able to get some use out of the C++/CLI CodeDOM provider -- having not used the C++ version of this type, I don't know how well it would handle code that is template heavy.

Steve Gilham
A: 

A robust C++ parser is available with the DMS Software Reengineering Toolkit. It parses a variety of C++ dialects including ANSI, GNU 3/4, MSVS6 and MSVisual Studio 2005 and managaged C++.

It builds ASTs and symbol tables (the latter is way harder than you might think). You can navigate the ASTs, transform into different valid C++ programs, and regenerate code including comments.

Ira Baxter