tags:

views:

1373

answers:

4

I'm somewhere on the learning curve when it comes to regular expressions, and I need to use them to automatically modify function prototypes in a bunch of C headers. Does anyone know of a decent regular expression to find any and all function prototypes in a C header, while excluding everything else?

Edit: Three things that weren't clear initially:

  1. I do not care about C++, only straight C. This means no templates, etc. to worry about.
  2. The solution must work with typedefs and structs, no limiting to only basic C types.
  3. This is kind of a one-off thing. It does not need to be pretty. I do not care how much of a kludge it is as long as it works, but I don't want a complex, hard to implement solution.
+9  A: 

You may implement a parser using ANSI C yacc/lex grammar.

Quassnoi
I can not repeat it enough. Regular expressions are not a substitute for a parser. +1.
dmckee
+3  A: 

To do this properly, you'll need to parse according to the C language grammar. But if this is for the C language only and for header files only, perhaps you can take some shortcuts and get by without full blown BNF.

^
\s*
(unsigned|signed)?
\s+
(void|int|char|short|long|float|double)  # return type
\s+
(\w+)                                    # function name
\s*
\(
[^)]*                                    # args - total cop out
\)
\s*
;

This is by no means correct, and needs work. But it could represent a starting point, if you're willing to put in some effort and improve it. It can be broken by function definitions that span lines, function pointer argument, MACROS and probably many other things.

Note that BNF can be converted to a regex. It will be a big, complex regex, but it's doable.

Paul Beckingham
I suspect classic regexs are unsuitable for parsing nested C++ template definitions.
J.F. Sebastian
True - mercifully, they're dealing with C and not C++.
Jonathan Leffler
Classic regular expressions are non-recursive, so they can't really express everything in BNF.
Darron
Your notation does not account for qualifiers (const, volatile, restrict), or unsigned long int or long double or pointers or arrays or typedefs (FILE *?) at minimum. You get +1; you say it is incomplete but it is a start. But those are just some of the factors that will have to be worried about.
Jonathan Leffler
It will work only with functions that return standard C types, not custome types
qrdl
yeah, he says it's incomplete. consider void (*(*baz(int[static 42]))[3])(char(*)[7]); no readable regex could ever parse such things i suspect :) but he gets +1 for same reasons as Jonathan pointed out.
Johannes Schaub - litb
+5  A: 

For a one-off exercise, you'd probably do best by starting simple and looking at the code you have to scan. Pick the three worst headers, generate a regex or series of regexes that do the job. You have to decide whether and how you are going deal with comments that contain function declarations (and, indeed, with function declarations that contain comments). Dealing with:

extern void (*function(int, void (*)(int)))(int);

(which could be the Standard C function signal()) is tough in a regex because of the nested parentheses. If you don't have any such function prototypes, time spent working out how to deal with them is time wasted. Similar comments apply to pointers to multi-dimensional arrays. The chances are that you have stylistic conventions to simplify your life. You may not use C99 (C++) comments; you don't need to code around them. You probably don't put multiple declarations in a single line, either with or without a common type - so you don't have to deal with that.

extern int func1(int), func2(double); double func3(int);  // Nasty!
Jonathan Leffler
A: 

Let's say you have the whole c file read into $buffer. * first create regexp that replaces all comments with equally number of spaces and linefeeds so that row and col positions won't change * create regexp that can handle parenthesised string * then regexp like this finds functions: (static|)\s+(\w+)\s*$parenthezized_regexp+*{

this reg exp does not handle functions which function definition uses preprocessor directives.

if you go for lex/yacc you have to combine ansi c and preprocessor grammars to handle those preprocessor directives inside function definitions

ihu