tags:

views:

1836

answers:

11
+5  Q: 

AST from C code

I want to perform some transformations on C source code. I need a tool on linux that generates a complete AST from the source code so that I can apply my transformations on this AST and then convert it back to the C source code. I tried ELSA but it is not getting compiled. (I am using Ubuntu 8.4). Can anyone suggest a better tool/application?

A: 

How about taking gcc and writing a custom backend for it? I've never done it nor even worked on gcc source code, so I don't know how hard it would be.

Roel
And how are you going to get it to regenerate the source code after the transformations are applied?
Ira Baxter
+3  A: 

www.antlr.org

kenny
While the default ANTLR distribution doens't contain a C parser, there are a number of them floating around, just google them.Regards,Sebastiaan
Sebastiaan Megens
There are ANTLR-based C parsers. I don't know if any of them can regenerate source from a (modified) AST.
Ira Baxter
+3  A: 

There are two projects that I'm aware of and that you could find useful:

They both parse a standard C source code to allow further analisys and transformation. I've not used them so you have to check for yourself if they fit your needs.

The suggestion of using GCC is also valid, of course. I know there's not much documentation on this aspect of gcc, though.

Remo.D
CIL doesn't regenerate source code, AFAIK.
Ira Baxter
A: 

You can try generate AST (Abstract Syntax Tree) using Lexx and Yacc on Linux:

lex and yacc

from lex and yacc to ast

milot
The problem would be to have a rather complete lex grammar for C which is not an easy task due to the C preprocessor, typing rules etc
Remo.D
Yes I know but lex and yacc are very powerful tools so I've messed a little with them so I thought it would help someone with this question. Because C is kinda primitive and of course it is not an easy task I agree with you completely.
milot
+6  A: 

I would recommend clang. It has a fairly complete C implementation with most gcc extensions, and the code is very understandable. Their C++ implementation is incomplete, but if you only care about generating ASTs from C code that should be fine. Depending on what you want to do you can either use clang as a library and work with the ASTs directly, or have clang dump them out to console.

Louis Gerbarg
AFAIK, clang won't regenerate C code from the AST.
Ira Baxter
It absolutely does. That is how all of the clang-cc rewrite functionality works. For a concrete example, checkout http://llvm.org/svn/llvm-project/cfe/trunk/lib/Frontend/RewriteBlocks.cpp which is what happens when you execute `clang-cc -rewrite-blocks`.
Louis Gerbarg
I stand corrected.
Ira Baxter
A: 

I believe OpenC++ is as close as you'll get right now.

+3  A: 

See pycparser - a pure-Python AST generator for C.

Eli Bendersky
A: 

The DMS Software Reengineering Toolkit has been used on huge C systems, parsing, analyzing, transforming, and regenerating C code. Doesn't run on Linux, just Windows, but it does handle Linux-style (GCC) C code.

I can't emphasize enough the ability to round-trip the C source code: parse, build trees, transform, regenerate compilable C code with the comments and either prettyprinted or with the original programmer's indentation. Few of the other answers here suggest systems that can do that robustly.

The fact that DMS is designed to carry out program transformations (as opposed to other systems suggested in answers here) is also a great advantage. DMS provide tree-pattern matches and rewrites; it augments this with full control and data flow analyis to be used to extend the conditions that you'd like to match. A tool intending to be a compiler is just that, and you'll have a very hard time persuading it not to be a compiler, and an instead to be a transformation engine as the OP requested.

Ira Baxter
A: 

http://ctool.sourceforge.net/

plan9assembler
A: 

I've done small amounts of work on source-to-source transformations and I found CIL to be very powerful for this task. CIL has the advantage of being a framework specifically designed for static source analysis and transformation. It can also process code with any amount of ugly GCC specific extensions(It's been used to process the Linux kernel, as one example.) Unfortunately, it is written in OCAML, and analyses/transformations built using it must also be writtne in OCAML, which might be problematic if you've never used it.

Alternatively, clang is supposed to have a relatively easily-hackable codebase and it can certainly be used to produce C AST's.

Falaina
Yes, CIL will regenerate compilable C code (for instance, what happens to comments?) Just not source code recognizable by the original programmer. That makes CIL useful for reasoning and code optimizations, but not for transforming a programmer's code.
Ira Baxter
A: 

"I tried ELSA but it is not getting compiled. (I am using Ubuntu 8.4)"

The Elkhound and Elsa source code, version 2005.08.22b from scottmcpeak.com/elkhound/ is outdated (old C++ style .h header files).

Elsa is working and part of Oink: http://www.cubewano.org/oink/#Gettingthecode I have just got it working now under Ubuntu 9.10.

eisbaw