views:

416

answers:

6

It should turn this

int Yada (int yada)
{
   return yada;
}

into this

int Yada (int yada)
{
   SOME_HEIDEGGER_QUOTE;
   return yada;
}

but for all (or at least a big bunch of) syntactically legal C/C++ - function and method constructs.

Maybe you've heard of some Perl library that will allow me to perform these kinds of operations in a view lines of code.

My goal is to add a tracer to an old, but big C++ project in order to be able to debug it without a debugger.

+1  A: 

I use this regex,

"(?<=[\\s:~])(\\w+)\\s*\\([\\w\\s,<>\\[\\].=&':/*]*?\\)\\s*(const)?\\s*{"

to locate the functions and add extra lines of code.

With that regex I also get the function name (group 1) and the arguments (group 2).
Note: you must filter out names like, "while", "do", "for", "switch".

Nick D
Looks as though any parameters of pointer-to-function type would need to be typedefed, right? This doesn't handle parentheses inside the parameter list. Not that I'm claiming it should: general parsing of C++ is not best done by regexps...
Steve Jessop
you are right, regexps are not *proper* for parsing. My regex is a sort of a *heuristic*. It will find many functions (or methods) but not all of them.
Nick D
+1 for being close enough for jazz, though :-)
Steve Jessop
+2  A: 

There is no such tool that I am aware of. In order to recognise the correct insertion point, the tool would have to include a complete C++ parser - regular expressions are not enough to accomplish this.

But as there are a number of FOSS C++ parsers out there, such a tool could certainly be written - a sort of intelligent sed for C++ code. The biggest problem would probably be designing the specification language for the insert/update/delete operation - regexes are obviously not the answer, though they should certainly be included in the language somehow.

People are always asking here for ideas for projects - how about this for one?

anon
if I am not mistaken, popular programming editors use regexs to identify the routines (not always with a success). Now, if you are not going to create *production* code, and you can re-adjust or fix the results from a regex and if you don't want to waste time on complex real parsers, why not use a regex instead?
Nick D
As you say, not always with success. This is not a big problem with interactive editors, as you can spot any problems by visual inspection. It is a problem for batch operations on hundreds of thousands of lines of code, where you can't. And if you are not going to create producttion code, go and do something else.
anon
inserting extra (trace) code after functions, ie for code *exploring*, is a case where you can throw away that code. That's what I meant with non production code. Even with a real parser how can you be sure that you wont have any problems when you are going to modify hundreds of routines?
Nick D
Because it is a real parser. That's like saying "How can I be sure that when I say x = 1 then 1 really does get assigned to x?" I am assuming the use of one that is fairly well tested like GCC. Note I did NOT suggest writing a new parser.
anon
our extra code will be placed in every function. That's the problem. With a regex we can be more selective. Ok, maybe we could accomplish that with the parser. Anyway, for one thing I'm sure: I will try the AspectC++ that Phil suggested.
Nick D
Looks like Phil's answer is a parser? AspectC++ is implemented as a C++ compiler that supports additional extensions i.e. it parses C++.
MarkJ
See other answers, which include a tool that has a full C++ parser and can automate arbitrary transformations.
Ira Baxter
+10  A: 

Try Aspect C++ (www.aspectc.org). You can define an Aspect that will pick up every method execution.

In fact, the quickstart has pretty much exactly what you are after defined as an example: http://www.aspectc.org/fileadmin/documentation/ac-quickref.pdf

Phil
@Phil I don't know if you are associated with AspectC++ in any way, but if you are please provide (or get them to provide) a single HTML link that describes succinctly what it does. The PDF you linked is a quickref card (useless if you don't understand what it is quickrefing). and I can't find anything better on the web site.
anon
http://en.wikipedia.org/wiki/AspectC%2B%2B has a decent explanation of what AspectC++ is. Actually, you probably want to start with http://en.wikipedia.org/wiki/Aspect-oriented_programming if you're not familiar with aspect-oriented programming.
dancavallaro
Sorry, I'm not affiliated with AspectC++ in any way.It is worth wrapping your head around Aspect Oriented Programming, though. It can be surprisingly powerful.
Phil
+5  A: 

If you build using GCC and the -pg flag, GCC will automatically issue a call to the mcount() function at the start of every function. In this function you can then inspect the return address to figure out where you were called from. This approach is used by the linux kernel function tracer (CONFIG_FUNCTION_TRACER). Note that this function should be written in assembler, and be careful to preserve all registers!

Also, note that this should be passed only in the build phase, not link, or GCC will add in the profiling libraries that normally implement mcount.

bdonlan
nice to know. great.
elcuco
+4  A: 

I would suggest using the gcc flag "-finstrument-functions". Basically, it automatically calls a specific function ("__cyg_profile_func_enter") upon entry to each function, and another function is called ("__cyg_profile_func_exit") upon exit of the function. Each function is passed a pointer to the function being entered/exited, and the function which called that one.

You can turn instrumenting off on a per-function or per-file basis... see the docs for details.

The feature goes back at least as far as version 3.0.4 (from February 2002).

This is intended to support profiling, but it does not appear to have side effects like -pg does (which compiles code suitable for profiling).

This could work quite well for your problem (tracing execution of a large program), but, unfortunately, it isn't as general purpose as it would have been if you could specify a macro. On the plus side, you don't need to worry about remembering to add your new code into the beginning of all new functions that are written.

Mark Santesson
/Gh and /GH for MSVC, calling _penter and _pexit
MSalters
A: 

This can be easily done with a program transformation system.

The DMS Software Reengineering Toolkit is a general purpose program transformation system, and can be used with many languages (C#, COBOL, Java, EcmaScript, Fortran, ..) as well as specifically with C++.

DMS parses source code (using full langauge front end, in this case for C++), builds Abstract Syntax Trees, and allows you to apply source-to-source patterns to transform your code from one C# program into another with whatever properties you wish. THe transformation rule to accomplish exactly the task you specified would be:

domain CSharp.

insert_trace():function->function
  "\visibility \returntype \fnname(int \parametername)
   { \body  } "
       ->
  "\visibility \returntype \fnname(int \parametername)
   { Heidigger(\CppString\(\methodname\),
               \CppString\(\parametername\),
               \parametername);
      \body } "

The quote marks (") are not C++ quote marks; rather, they are "domain quotes", and indicate that the content inside the quote marks is C++ syntax (because we said, "domain CSharp"). The \foo notations are meta syntax.

This rule matches the AST representing the function, and rewrites that AST into the traced form. The resulting AST is then prettyprinted back into source form, which you can compile. You probably need other rules to handle other combinations of arguments; in fact, you'd probably generalize the argument processing to produce (where practical) a string value for each scalar argument.

It should be clear you can do a lot more than just logging with this, and a lot more than just aspect-oriented programming, since you can express arbitrary transformations and not just before-after actions.

Ira Baxter