views:

154

answers:

2

Folks, could you please give me an example of writing a custom gcc preprocessor?

My goal is to replace SID("foo") alike macros with appropriate CRC32 computed values. For any other macro I'd like to use the standard cpp preprocessor.

It looks like it's possible to achieve this goal using -no-integrated-cpp -B options, however I can't find any simple example of their usage.

+5  A: 

Warning: dangerous and ugly hack. Close your eyes now You can hook your own preprocessor by adding the '-no-integrated-cpp' and '-B' switches to the gcc command line. '-no-integrated-cpp' means that gcc does search in the '-B' path for its preprocessors before it uses its internal search path. The invocations of the preprocessor can be identified if the 'cc1', 'cc1plus' or 'cc1obj' programs (these are the C, C++ and Objective-c compilers) are invoked with the '-E' option. You can do your own preprocessing when you see this option. When there is no '-E' option pass all the parameters to the original programs. When there is such an option, you can do your own preprocessing, and pass the manipulated file to the original compiler.

It looks like this:

> cat cc1
#!/bin/sh

echo "My own special preprocessor -- $@"

/usr/lib/gcc/i486-linux-gnu/4.3/cc1 $@
exit $?

> chmod 755 cc1
> gcc -no-integrated-cpp -B$PWD x.c
My own special preprocessor -- -E -quiet x.c -mtune=generic -o /tmp/cc68tIbc.i
My own special preprocessor -- -fpreprocessed /tmp/cc68tIbc.i -quiet -dumpbase x.c -mtune=generic -auxbase x -o /tmp/cc0WGHdh.s

This example calls the original preprocessor, but prints an additional message and the parameters. You can replace the script by your own preprocessor.

The bad hack is over. You can open your eyes now.

Rudi
It's not that bad since it does the job :)
pachanga
Hm...correct me if I'm wrong but I thought using this approach I could replace SID macros, save the result to some temp file and then apply the standard preprocessor to this temp file. No?
pachanga
@pachanga Yes, you need to extract the command line options for the input and output files, and write a second tempfile for the output of your processor (I believe you need to preserve the file extension). Then you pass the processed file as input file to THE ORIGINAL(TM) preprocessor by patching the input file parameter. But leave all other parameters the way they were, since some of them are position depend (like -I, -D or -U). After THE ORIGINAL(TM) preprocessor is done you clean up your tempfile and leave with the exit code of THE ORIGINAL(TM) preprocessor.
Rudi
+1  A: 

One way is to use a program transformation system, to "rewrite" just the SID macro invocation to what you want before you do the compilation, leaving the rest of the preprocessor handling to the compiler itself.

Our DMS Software Reengineering Toolkit is a such a system, that can be applied to many languages including C and specifically the GCC 2/3/4 series of compilers.

To implement this idea using DMS, you would run DMS with its C front end over your source code before the compilation step. DMS can parse the code without expanding the preprocessor directives, build abstract syntax trees representing it, carry out transformations on the ASTs, and then spit out result as compilable C text.

The specific transformation rule you would use is:

rule replace_SID_invocation(s:STRING):expression->expression
          = "SID(\s)" -> ComputeCRC32(s);

where ComputeCRC32 is custom code that does what it says. (DMS includes a CRC32 implementation, so the custom code for this is pretty short.

DMS is kind a a big hammer for this task. You could use PERL to implement something pretty similar. The difference with PERL (or some other string match/replace hack) is the risk that a) it might find the pattern someplace where you don't want a replacement, e.g.

  ... QSID("foo")... // this isn't a SID invocation

which you can probably fix by coding your pattern match carefully, b) fail to match a SID call found in suprising circumstances:

  ...   SID  ( /* master login id */  "Joel" )  ... // need to account for formatting and whitespace

and c) fail to handle the various kinds of escape characters that show up in the literal string itself:

  ...   SID("f\no\072") ...  // need to handle all of GCC's weird escapes

DMS's C front end handles all the escapes for you; the ComputeCRC32 function above would see the string containing the actual intended characters, not the raw text you see in the source code.

So its really a matter of whether you care about the dark-corner cases, or if you think you may have more special processing to do.

Given the way you've described the problem, I'd be sorely tempted to go the Perl route first and simply outlaw the funny cases. If you can't do this, then the big hammer makes sense.

Ira Baxter