views:

66

answers:

3

Hi! I just want to ask your ideas regarding this matter. For a certain important reason, I must extract/acquire all function names of functions that were called inside a "main()" function of a C source file (ex: main.c).

Example source code:

int main()
{
    int a = functionA(); // functionA must be extracted
    int b = functionB(); // functionB must be extracted
}

As you know, the only thing that I can use as a marker/sign to identify these function calls are it's parenthesis "()". I've already considered several factors in implementing this function name extraction. These are:
1. functions may have parameters. Ex: functionA(100)
2. Loop operators. Ex: while() 3. Other operators. Ex: if(), else if() 4. Other operator between function calls with no spaces. Ex: functionA()+functionB()

As of this moment I know what you're saying, this is a pain in the $$$... So please share your thoughts and ideas... and bear with me on this one...

Note: this is in C++ language...

+2  A: 

You can write a Small C++ parser by combining FLEX (or LEX) and BISON (or YACC).

  1. Take C++'s grammar
  2. Generate a C++ program parser with the mentioned tools
  3. Make that program count the funcion calls you are mentioning

Maybe a little bit too complicated for what you need to do, but it should certainly work. And LEX/YACC are amazing tools!

Pablo Santa Cruz
Thanks Pablo! Honestly I don't a clue of what LEX/YACC is.. So I will research it. Regarding your number 3 instruction, I don't get what you want to tell me..
Hisoka
You will write a program (mostly) generated by FLEX and BISON. You will be able to identify function calls in that program very easily. That's point #3.
Pablo Santa Cruz
A: 

One option is to write your own C tokenizer (simple: just be careful enough to skip over strings, character constants and comments), and to write a simple parser, which counts the number of {s open, and finds instances of identifier + ( within. However, this won't be 100% correct. The disadvantage of this option is that it's cumbersome to implement preprocessor directives (e.g. #include and #define): there can be a function called from a macro (e.g. getchar) defined in an #include file.

An option that works for 100% is compiling your .c file to an assembly file, e.g. gcc -S file.c, and finding the call instructions in the file.S. A similar option is compiling your .c file to an object file, e.g, gcc -c file.c, generating a disassembly dump with objdump -d file.o, and searching for call instructions.

Another option is finding a parser using Clang / LLVM.

pts
thank you very much... I will consider you ideas! thanks again...
Hisoka
@Hisoka: If you find my answer useful, please vote it up (by clicking to the /\ up arrow to the left of the answer) after you gain 15 reputation in total.
pts
+1  A: 

gnu cflow might be helpful

jokester
thanks sir jokester! but what I specifically need to do is just create a simple C++ code to parse all function names... but thanks again for your answer Sir...
Hisoka