tags:

views:

435

answers:

4

I'd like to do a search for simple if statements in a collection of C source files.

These are statements of the form:

if (condition)
    statement;

Any amount of white space or other sequences (e.g. "} else ") might appear on the same line before the if. Comments may appear between the "if (condition)" and "statement;".

I want to exclude compound statements of the form:

if (condition)
{
    statement;
    statement;
}

I've tried each of the following in awk:

awk  '/if \(.*\)[^{]+;/ {print NR $0}' file.c    # (A) No results
awk  '/if \(.*\)[^{]+/ {print NR $0}' file.c    # (B)
awk  '/if \(.*\)/ {print NR $0}' file.c          # (C)

(B) and (C) give different results. Both include items I'm looking for and items I want to exclude. Part of the problem, obviously, is how to deal with patterns that span multiple lines.

Edge cases (badly formed comments, odd indenting or curly braces in odd places, etc.) can be ignored.

How can I accomplish this?

+1  A: 

I'm not sure how you'd do this with a one liner (I'm sure you could by using sed's 'n' command to read the next line, but it would be very complicated), so you probably want to use a script for this. How about:

perl parse_if.pl file.c

Where parse_if.pl contains:

#!/usr/bin/perl -w

my $line_number = 0;
my $in_if = 0;
my $if_line = "";
# Scan through each line
while(<>)
{
    # Count the line number
    $line_number += 1;
    # If we're in an if block
    if ($in_if)
    {
        # Check for open braces (and ignore the rest of the if block
        # if there is one).
        if (/{/)
        {
            $in_if = 0;
        }
        # Check for semi-colons and report if present
        elsif (/;/)
        {
            print $if_line_number . ": " . $if_line;
            $in_if = 0;
        }
    }
    # If we're not in an if block, look for one and catch the end of the line
    elsif (/^[^#]*\b(?:if|else|while) \(.*\)(.*)/)
    {
        # Store the line contents
        $if_line = $_;
        $if_line_number = $line_number;
        # If the end of the line has a semicolon, report it
        if ($1 =~ ';')
        {
            print $if_line_number . ": " . $if_line;
        }
        # If the end of the line contains the opening brace, ignore this if
        elsif ($1 =~ '{')
        {
        }
        # Otherwise, read the following lines as they come in
        else
        {
            $in_if = 1;
        }
    }
}

I'm sure you could do something fairly easily in any other language (including awk) if you wanted to; I just thought that I could do it quickest in perl by way of an example.

Al
I've posted a modified version based on yours. It fixes a couple of problems. One: Yours repeats found lines because a successful find of a semicolon doesn't terminate the block (there's no "$in_if = 0;" in the first "elsif"). Two: Yours prints the line number of the line with the semicolon for the line with the "if" (making "$if_line = $line_number . ": " . $_;" and removing that from the print statements fixes that).
Dennis Williamson
Good points, thanks for that (I just knocked up my code very quickly without too much care I guess). I'll modify my source to deal with these comments. I deliberately wanted to print the line containing the start of the if, so I think it should also print the line number of the if...
Al
I've also changed the check for the if line to look for else/while and to verify that there aren't any '#' characters before the keyword. This isn't the ideal way to do it (a more robust way would check for (#\s*) before the if/else, but I was being a little lazy. The only obvious problem that this would cause is on a line like: /* comment with # */ if (something). Obviously there are many ways this could improved to do whatever you need it to do!
Al
Yeah, semicolons in comments throw it off, too.
Dennis Williamson
It's not necessarily that helpful, but an alternative answer: if you have access to a Lint-type tool that supports the MISRA-C guidelines, this will give you a list of all of that type of if () statement since non-braced if () statements violate the guidelines.
Al
A: 
Neeraj
I don't see a test for "{".
Dennis Williamson
A: 

Using Awk u can do this by:
awk 'BEGIN{flag=0}{if($0 ~ /if/ ){print $0;flag=NR+1}if(flag==NR)print $0 }' try.c

Neeraj
I don't see a test for "{".
Dennis Williamson
This is the kind of problem that one-liners get far too messy to work with in my opinion.
Al
A: 

Based on Al's answer, but with fixes for a couple of problems (plus I decided to check for simple else clauses, too (also, it prints the full if block):

#!/usr/bin/perl -w

my $line_number = 0;
my $in_if = 0;
my $if_line = "";
#ifdef NEW
my $block = "";
#endif /* NEW */
# Scan through each line
while(<>)
{
    # Count the line number
    $line_number += 1;
    # If we're in an if block
    if ($in_if)
    {
        $block = $block . $line_number . "+ " . $_;
        # Check for open braces (and ignore the rest of the if block
        # if there is one).
        if (/{/)
        {
            $in_if = 0;
            $block =  "";
        }
        # Check for semi-colons and report if present
        elsif (/;/)
        {
            print $if_line;
            print $block;
            $block = "";
            $in_if = 0;
        }
    }
    # If we're not in an if block, look for one and catch the end of the line
    elsif (/(if \(.*\)|[^#]else)(.*)/)
    {
        # Store the line contents
        $if_line = $line_number . ": " .  $_;
        # If the end of the line has a semicolon, report it
        if ($2 =~ ';')
        {
            print $if_line;
        }
        # If the end of the line contains the opening brace, ignore this if
        elsif ($2 =~ '{')
        {
        }
        # Otherwise, read the following lines as they come in
        else
        {
            $in_if = 1;
        }
    }
}
Dennis Williamson