tags:

views:

373

answers:

5

I want to get only function prototypes like

int my_func(char, int, float)
void my_func1(void)
my_func2()

from C files using regex and python.

Here is my regex format: ".*\(.*|[\r\n]\)\n"

+2  A: 

This is a convenient script I wrote for such tasks but it wont give the function types. It's only for function names and the argument list.

# Exctract routine signatures from a C++ module
import re

def loadtxt(filename):
    "Load text file into a string. I let FILE exceptions to pass."
    f = open(filename)
    txt = ''.join(f.readlines())
    f.close()
    return txt

# regex group1, name group2, arguments group3
rproc = r"((?<=[\s:~])(\w+)\s*\(([\w\s,<>\[\].=&':/*]*?)\)\s*(const)?\s*(?={))"
code = loadtxt('your file name here')
cppwords = ['if', 'while', 'do', 'for', 'switch']
procs = [(i.group(2), i.group(3)) for i in re.finditer(rproc, code) \
 if i.group(2) not in cppwords]

for i in procs: print i[0] + '(' + i[1] + ')'
Nick D
thanks alot its working
Do f.read() instead of ''.join(f.readlines())
Michał Niklas
@Michal, I believe I had a reason for doing that way but I cant remember at this moment :)
Nick D
I'm glad I could help
Nick D
+2  A: 

See if your C compiler has an option to output a file of just the prototypes of what it is compiling. For gcc, it's -aux-info FILENAME

ysth
A: 

I think regex isn't best solution in your case. There are many traps like comments, text in string etc., but if your function prototypes share common style:

type fun_name(args);

then \w+ \w+\(.*\); should work in most cases:

mn> egrep "\w+ \w+\(.*\);" *.h
md5.h:extern bool md5_hash(const void *buff, size_t len, char *hexsum);
md5file.h:int check_md5files(const char *filewithsums, const char *filemd5sum);
Michał Niklas
+1 for mentioning that this is not a good idea. C source is not a regular language.
Svante
A: 

I think this one should do the work:

r"^\s*[\w_][\w\d_]*\s*.*\s*[\w_][\w\d_]*\s*\(.*\)\s*$"

which will be expanded into:

string begin:   
     ^
any number of whitespaces (including none):
     \s*
return type:
  - start with letter or _:
     [\w_]
  - continue with any letter, digit or _:
     [\w\d_]*
any number of whitespaces:
     \s*
any number of any characters 
  (for allow pointers, arrays and so on,
  could be replaced with more detailed checking):
     .*
any number of whitespaces:
     \s*
function name:
  - start with letter or _:
     [\w_]
  - continue with any letter, digit or _:
     [\w\d_]*
any number of whitespaces:
     \s*
open arguments list:
     \(
arguments (allow none):
     .*
close arguments list:
     \)
any number of whitespaces:
     \s*
string end:
     $

It's not totally correct for matching all possible combinations, but should work in more cases. If you want it to be more accurate, just let me know.

EDIT: Disclaimer - I'm quite new to both Python and Regex, so please be indulgent ;)

paffnucy
+1  A: 

There are LOTS of pitfalls trying to "parse" C code (or extract some information at least) with just regular expressions, I will definitely borrow a C for your favourite parser generator (say Bison or whatever alternative there is for Python, there are C grammar as examples everywhere) and add the actions in the corresponding rules.

Also, do not forget to run the C preprocessor on the file before parsing.

fortran
Exactly. No matter how good you make your regular expression, it will always be at best a crude approximation of the processing done by an actual C parser. Why try to reinvent the wheel when there are already tools out there tailored for this purpose.
Kamil Kisiel