tags:

views:

60

answers:

3

Given the function name and the number of how many parameters the function has, how to list all the function definitions using egrep and regex?

For example, the name of the function is "find" we are expected to find those functions "find" that have just three parameters, no more and no less like the following:

sometype find ( type1 para1 , type2 para 2 , type3 para 3 )

I try to solve the problem by myself as:

egrep "find" * | egrep "([^,]\*,[^,]\*,[^,]\*)"

but it doesn't work. So I need your help to point out what's wrong with the regex I used and give me your solution to the "name : find number of parameters: 3" problem if possible.

+5  A: 

Using regexes is not reliable, doubly not with egrep, unless you follow some conventions and don't do anything too hard.

Consider:

void *
function(
    int a,
    void (*pointer)(const char *, int, double),
    double d
)

This declaration is spread over 6 lines - and egrep only looks at one line at a time.

This declaration contains 5 commas and 3 parameters.

If you put enough restrictions on the code you are searching, you can probably get an approximation to what you are after, but C and C++ are both very hard to analyze. And I'm not even thinking about macros that invoke the function for you.


Your proposed solution has a number of flaws, even after solving the problem with the extraneous backslashes (diagnosed correctly by Tim Pietzcker):

egrep "find" * | egrep "\([^,]*,[^,]*,[^,]*\)"

This will discover lines such as:

find(1, 2, 3);
int extra_find(int a, int b, int c) { ... }
extraordinary(find, 3, 21);
printf("find: %.*s\n", 13, "heliotrope");
for (find(1); printf("%d %d\n", 1, 2); x++)
for (x(find, 1); b < max(c, d); i++)
/* find(1,2,3) */

Only one of those is a function definition, and that still isn't one of the outputs you wanted.

If you can play with Perl (or Python) or any tool with PCRE (Perl-Compatible Regular Expressions) or equivalent, then you can do things like ensure that on a single line the word 'find' appears followed by an open parenthesis, a sequence of 'type name' values separated by commas and white space, and a close parenthesis.

perl -ne 'print if m/\bfind\s*\(\w+\s+\w+(\s*,\s*\w+\s+\w+){2}\s*\)/'

But that doesn't handle pointers, arrays, qualifiers like 'const', or pointers to functions (or references, if you are using C++), or structures referenced by 'struct somename varname', or function definitions protected against macro expansion (int (getchar)(int c)), or ... And it still doesn't distinguish between declarations and definitions!

Jonathan Leffler
+2  A: 

You are escaping the * where you shouldn't because it really is a quantifier here - now you're trying to match the asterisk literally. But you should escape the parentheses.

So:

\([^,]*(,[^,]*){2}\)

would work better, but - as Jonathan Leffler wrote - that will only work in a very small subset of possible cases, so you should perhaps think about a different approach.

Tim Pietzcker
+1  A: 

How about a regex such as following (Perl):

find\s+\(\s*\w+\s+\w+,\s*\w+\s+\w+,\s*\w+\s+\w+\)

?

Sandeep Satavlekar
Close - see my edited answer (which I generated while you were adding yours). Your regular expression finds 'extra_find(int i, int j, int k)' which is not strictly what is wanted. (But then, none of the other answers does precisely as required either.)
Jonathan Leffler