views:

193

answers:

4

I'd like to start a project that involves transforming C code, but I'd like to include the preprocessor directives. I don't want to reinvent the wheel by writing my own C parser, so does anyone know of a front-end that can parse C preprocessor and C code, and produce an AST that can be used to re-generate (or pretty-print) the original source?

e.g.,:

#define FILENAME "filename"
#include <stdio.h>

FILE *f=0;
...
if (file_is_open) {
#ifdef CAN_OPEN_IT
    f = fopen(FILENAME, "r");
#else
    printf("Unable to open file.\n");
#endif
}

The above code should be parsed into some in-memory representation that can be used to re-generate the source. In other words, it should not be processed as normal C in two phases, first processing the PP directives and then parsing pure C code. Rather it should represent the whole compile-time logic including the preprocessor variables.

A: 

Take the GNU gcc compiler, the flags required to pre-process the source is gcc -E mysource.c, see here for further information. As for pretty printing it, there's indent and this explains the usage here, this is a bit old, but nonetheless worthy of mention. There is also cflow that can produce a map of the source.

Sorry if I misunderstood what you're looking for...

Hope this helps, Best regards, Tom.

tommieb75
Why the downvote? I mentioned indent and cflow...but the question is exactly not clear as to why the AST is needed when the context of the question included 'pretty print'. It would be nice for a downvote to leave a comment explaining why instead of ignoring it which is against the spirit of SO.
tommieb75
I have no idea why someone would downvote you either.
Pascal Cuoq
Downvotes happen; they're a nuisance. Usually, they don't do irreparable damage to your reputation.
Jonathan Leffler
@Jonathan: Quick question, earlier I had 3 upvotes for http://stackoverflow.com/questions/2142796/in-linux-how-can-i-test-whether-the-output-of-a-program-is-going-to-a-live-termi/2142845#2142845 this, but is showing up as 5, instead of 30 why?
tommieb75
Sorry if it wasn't clear, I'm looking for something that parses C and preprocessor code, not necessarily a pretty printer, but the reason I mentioned this is that a pretty printer probably parses the CPP code. What I want is something that will generate an AST that includes the CPP logic. I don't care about pretty printing per se.
Steve
@Steve: Ok, the best answer I can give is to look at Antlr's grammar for parsing here... http://www.antlr.org/grammar/list...using Antlr you can generate an AST and has multiple language interface, ie C#, C, CPP, Java can use the Antlr libraries for parsing, if that's what you are looking for... :)
tommieb75
@tommieb75: regarding your '5 instead of 30'; I'd guess you reached your 200 limit for the day - after which you get a Mortarboard badge and no more points.
Jonathan Leffler
+1  A: 

Take a look at Clang. (See http://clang.llvm.org/features.html#applications .)

Matthew Slattery
Thanks, this looks like it's along the right lines.
Steve
I don't believe Clang captures preprocessor directives in its ASTs.
Ira Baxter
A: 

You can take look at the http://www.antlr.org/wiki/display/ANTLR3/ANTLR3+Code+Generation+-+C

WizKiranPuttur
This seems to be about (ANTLR) parser generators that produce parsers implemented in C. The OP wants something that *parses* C. Did I miss something?
Ira Baxter
A: 

Our DMS Software Reengineering Toolkit has a C front end (and a C++ front end) that:

  • parses (compilable) C source code in a variety of dialects into ASTs,
  • preserves the preprocessor directives in most cases as AST nodes
  • can regenerate compilable C code (with comments and preprocessor directives) from the ASTs
  • can collects thousands of files in a single image to allow cross-file analysis and transformation
  • provides full symbol table construction and access
  • provides procedural access to ASTs with a large AST manipulation library, including navigate, inspect, insert, delete, replace, match, ...
  • provides source-to-source transformations using patterns written in the C notation that match against the ASTs

For C (not yet for C++), DMS also provides:

  • control and data flow analysis
  • local and global points-to analysis
  • global call graph construction

DMS has been used to process extremely large C applications for the purposes of extracting facts and generating new, derived code from the original source base.

It can handle the OP's example directly.

Ira Baxter