views:

254

answers:

6

I have a C file in which we are moving the logging infrastructure. So

if ( logging_level >= LEVEL_FINE )
  printf("Value at %p is %d\n", p, i);

becomes

do_log2(LEVEL_FINE, "Value at %p is %d\n", _ptr(p), _num(i));

do_log2 means log with 2 arguments.

So I need a C parsing and modification infrastructure to do this.

Which tool can I use to accomplish this most easily ?

Note: the printf can also appear in the file as :

if ( logging_level >= LEVEL_FINE )
{
  printf("Value at %p is %d\n", 
  p, 
  i);
}

(indented and in a block). So this will be hard to do from simple text parsing in perl.

EDIT: This is my final perl code that does what I want

#!/usr/bin/perl -W 
$source=<<'END';
include etc
if ( logging_level >= LEVEL_DEBUG )
{
  printf("1:Value at %p is %d\n",
  p(1),
  i(2));
}
hello();
if ( logging_level >= LEVEL_FINE )
{
  printf("2:Value  is %d\n", i);
  printf("3:Value  is %d\n", i);
}
if ( logging_level >= LEVEL_FINE )
{
  printf("2:Value  is %d\n"
     "and other info", i);
}

other();
if(logging_level>=LEVEL_INFO){printf("4:Value at %p is %d %d\n",p(x),i,j);}
if(logging_level>=LEVEL_FINE) printf("5:Just sayin\"\n");
printf("not logging statement\n").
END

while( $source =~ m/\G(.*?\n)\s* if \s* \( \s* logging_level \s* >= \s* ([A-Z_0-9]+) \s* \) \s*(\{?)/sgxc )
{
    my $othercode = $1;
    my $loglevel=$2;
    my $inblock = $3;
    print("$othercode");

    while($source =~ m/\G\s*printf \( ([^;]*) \) \;/sgxc )
    {
    my $insideprint = $1;
    unless ($insideprint =~ /((\"([^\"\\]|\\.)*\")(\s*(\"([^\"\\]|\\.)*\"))*)/g) #fixing stackoverflow quote problem "
    {
        die "First arg not string literal";
    }
    my $formatstr = $1;
    my $remain = substr($insideprint, pos($insideprint));
    $remain =~ tr/\n \t//d;
    my @args = split(",", $remain);
    shift @args;

    my $numargs = @args;

    print "do_log${numargs}($loglevel, $formatstr";
    for (my $i=0; $i < $numargs; $i++)
    {
        unless ($formatstr =~ /%([a-z]+)/g)
        {
     die "Not enough format for args : $formatstr, args = ", join(",", @args), "\n";
        }
        my $lastchar = substr($1, length($1) -1);
        my $wrapper = "";
        if ($lastchar eq "u" || $lastchar eq  "d")
        { $wrapper = "_numeric";}
        elsif($lastchar eq "p"){ $wrapper = "_ptr";}
        elsif($lastchar eq "s"){ $wrapper = "_str";}
        else { die "Unknown format char %$lastchar in $formatstr"; }

        print ", ${wrapper}($args[$i])";
    }
    print ");";
    last unless ($inblock);
    }
# eat trailing }
    if ($inblock)
    {
    if ($source =~ m/\G \s* \} /sgxc)
    {
    }
    else
    {
    }
    }
}
#whatever is left 
print substr($source, pos($source));

output:

include etc
do_log2(LEVEL_DEBUG, "1:Value at %p is %d\n", _ptr(p(1)), _numeric(i(2)));
hello();
do_log1(LEVEL_FINE, "2:Value  is %d\n", _numeric(i));
do_log1(LEVEL_FINE, "3:Value  is %d\n", _numeric(i));
do_log1(LEVEL_FINE, "2:Value  is %d\n"
         "and other info", _numeric(i));

other();
do_log3(LEVEL_INFO, "4:Value at %p is %d %d\n", _ptr(p(x)), _numeric(i), _numeric(j));
do_log0(LEVEL_FINE, "5:Just sayin\"\n");
printf("not logging statement\n").

Woohoo! Now to apply to actual source code.

+4  A: 

What you need is a Program Transformation System that can parse C and carry out transformations on the essence of the code (e.g., on the corresponding compiler data structures) rather than the text (so it isn't confused by text layout, etc.). (Program transformation is a generalization of refactoring).

The DMS Software Reengineering Toolkit is a such a program transformation system, and it has a C parser which has been applied to very big C systems.

With DMS, your change can be written as:

domain C; -- work with C language syntax

rule change_logging(exp: p, exp: i, s: literal_string, c:literal_integer): stmt -> stmt
  "if ( logging_level >= \l )
      printf(\s, \p, \i);"
  ->  
  "do_log2(\l, \s, _ptr(\p), _num(\i));".

The \k are either metaquotes (the " in C has to be quoted inside the rule quotes!) or metavariables (\p \i \s) of a corresponding syntax type.

In practice, one writes a set of cooperating transformation rules to carry out a more complex task (you probably have log1 and log3 cases, too).

The pattern is translated, like the parsed C code, into the equivalent compiler data structures and then matched against the compiler data structures for the C code, which is why formatting of the text doesn't matter. When a match is found, it is replaced by the compiler data structures for the right hand of the rule (2nd ->). After all transformations have been applied, the resulting compiler data structures are used to regenerate the modified text by applying the the opposite of parsing: prettyprinting. Voila, your change is made.

There are some complications with macros and preprocessor directives, but those are even worse if you to have do this with string hacking methods as often implemented with Perl.

There are also complications involving reasoning about side effects, reaching definitions, pointer values, etc; DMS provides support to deal with all of these.

Ira Baxter
Is there such a system that is open source ?
Sid Datta
The closest thing is TXL and/or Stratego. The good news is that they have the same concept behind them. The more difficult news is that they do not have robust C parsers associated with them.These aren't easy systems to build. DMS took 15 years.
Ira Baxter
What's PERL, Ira?
Chris Lutz
+1  A: 

The benefits of C99 and __VA_ARGS__!

How rigidly do you have the two example layouts? More specifically, do you ever have other activity (such as a loop) inside the if (logging_level...) conditions-with-braces? Or multiple printf() statements under the control of a single if?

If you don't have much creativity in the ways that the debugging code was (ab)used, then you can do it with an ad hoc Perl script - not beautiful, but this is a one-off change (though likely to be run on many files).

Handling the adornment of the parameters as in _ptr(p) and _num(i) is adding another level of complexity. You'd have to parse the string literal (trusting no-one got fancy enough to use anything other than a string literal) to work out what the types of the arguments need to be.

Altogether, not a trivial exercise, especially if the developers were inventive. I'd expect to write a script that handled 90% or more of the cases, then deal with the exceptions as they are found.

Jonathan Leffler
+3  A: 

You don't need to count the arguments:

#include <stdio.h> 
#include <stdarg.h> 

void do_log( int level, char *format, ... ){
  va_list ap;
  va_start( ap, format );
  printf( "level: %i ", level ); vprintf( format, ap ); puts("");
  va_end(ap);
}

int main(){
  do_log( 1, "zero" );
  do_log( 2, "one: %i", 1 );
  do_log( 3, "one: %i two: %i", 1, 2 );
}

I would do the code rewrite with perl. I don't see why it is hard.

EDIT: i wrote some perl code to rewrite the logging code pieces:

#!/usr/bin/perl -W 

$source=<<'END';
if ( logging_level >= LEVEL_FINE ).
{
  printf("1:Value at %p is %d\n",.
  p(1),
  i(2));
}

if(logging_level>=LEVEL_FINE){printf("2:Value at %p is %d\n",p(x),i,j);}
END

$res = '';
while( $source =~ /\G(.*?)if\s*\(\s*logging_level\s*>=\s*([A-Z_]+)\s*\)\s*{\s*printf\s*(\(((?:[^()]+|(?3))+)\))\s*;\s*}/sg ){
  $lastpos = pos($source); $res .= $1; $l=$2; $p=$4; $p =~ s/[\r\n\s]+//g;
  $c = $p =~ tr/,/,/;
  $res .= "do_log$c($l,$p);";
}
print $res, substr($source,$lastpos);

Result:

do_log2(LEVEL_FINE,"1:Valueat%pis%d\n",p(1),i(2));

do_log3(LEVEL_FINE,"2:Valueat%pis%d\n",p(x),i,j);

I add simple argument counting to the code. Hope to help.

sambowry
the logging infrastructure (do_log* etc) is set in stone hehe, I cannot implement it. I am not allowed to even write a simplifying wrapper. Large old system, strict rules.
Sid Datta
I still have to handle multiline strings, the _ptr, _num etc concept, but your code is a good starting point. Thanks!Now let me see if I can decode that RE :P
Sid Datta
A: 

How many times is logging_level referenced? The process is called refactoring. If the change is trivial, a good regular expression can be used in your favorite editor. But often the code has many variations on the same theme. In this case, all of them can be found through logging_level. You can phase them out by hiding the value of logging_level for code (so you'd get a compiler warning, but it'd still work). Or use an editor like source-insight, which can show you all references in one go.

Some examples of variations (which would be hard to find with a script):

if ( logging_level >= LEVEL_FINE )
  printf("Value at %p is %d\n", p, i);

if ( logging_level >= LEVEL_FINE ) {
  calculated_value = i*2/3;
  printf("Value at %p is %f\n", p, calculated_value);
}

(note the brackets and the calculated variables).

For each file with the old construction, you can do a search replace:

search for : if \(\s*logging_level\s*>=\s*(LEVEL_[a-zA-Z]+) replace with : do_log2(\1,)

It is possible to include the printf, but only if your editor supports multi-line patterns.

Adriaan
+1  A: 

You might also want to check out coccinelle, which is also used by Linux kernel hackers to do widely automated large scale code transformations using semantic patching.

Hope this helps

none
+2  A: 

The Coccinelle solution would be:

@@
expression p,i;
@@

-if ( logging_level >= LEVEL_FINE )
-  printf("Value at %p is %d\n", p, i);
+do_log2(LEVEL_FINE, "Value at %p is %d\n", _ptr(p), _num(i));

There is no general way to solve the problem of calculated_value cited above, but it would be possible to find code that has this problem as follows:

@@
expression p,i;
@@

*if ( logging_level >= LEVEL_FINE )
   { ...
*  printf("Value at %p is %d\n", p, i);
   ... }

The result will look like a diff, but the minuses in column 0 are meant to indicate items of interest, not items to remove.

julia
Thanks, I hope Coccinelle offers some scripting capabilities so that arg counting etc can be solved too.I solved my problem with perl but thanks anyways :)
Sid Datta