views:

2115

answers:

3

I have a list of function calls stored in a database, and for some function calls, I care about what the arguments of the function call are. I am parsing C source code with my program (which is in C#). I'm trying to find the best way of getting the function calls with the arguments. I read the source code into a string prior to parsing it (so that I am not using the stream reader on the file). I tried using some regex (which is somewhat new to me) to parse the source file, but was retrieving more than just the function call when using a regex string like this: functionCall + ".*\\)"; ( I am escaping the opening ( in the function call)

The function calls are stored in the following format in the DB

Function Call
============
some_Call(

There is a reason they are stored this way, and will not change.

Is there a good way to do this through regex, or would I be better suited to walk through the source code contents?

Let me know if any clarification is needed.

+1  A: 

I have written a quick regex and tested it, check the following:

            string tst = "some_function(type<whatever> tesxt_112,type<whatever> tesxt_113){";

        Regex r = new Regex(".*\\((.*)\\)");
        Match m = r.Match(tst);
        if (m.Success)
        {
            string[] arguments = m.Groups[1].Value.Split(',');
            for (int i = 0; i < arguments.Length; i++)
            {
                Console.WriteLine("Argument " + (i + 1) + " = " + arguments[i]);
            }
        }

        Console.ReadKey();

So the output for the above string would be:

Argument 1 = type<whatever> tesxt_112

Argument 2 = type<whatever> tesxt_113

Hope this helps:

Andrew :-)

REA_ANDREW
What I really need is your "tst" string from the source code itself, but this will be super useful once i get that :-)
phsr
Are you having trouble getting this value from the DB?
REA_ANDREW
I used Daniel L's solution to find the function line and your solution to get the arguments
phsr
A: 

Not to deteriorate you but... in C, I believe (vaguely) that you can do this:

void secondFunction() { /* no-op */ }

void firstFunction()
{
    void* xyz = secondFunction;

    xyz(); // this should call secondFunction
}

Is that a possible scenario? And what about other variants of pointer usages?!?

Say, type casting functional-style?!?

int a;
float b = float(a); // call to the "float" function?!? NO! it's a type casting

Use a list of predefined types? What if the conversion was to a custom structs and what about typedefs? Now you'd have to parse those too!

Seriously, use a parser!! There're several available options already that could parse C.

I think Regex is a rather bad tool for the job.

chakrit
+3  A: 

Part of the reason your solution failed is that you probably should have used .*?), instead of greedy matching.

A complete answer would have to follow at least these:

Ignore parenthesis in strings and chars (which you can do with a regex, although with escaping it can be a little complicated)

functionCall("\")", ')')

Ignore parentheses in comments (which you can do with a regex)

functionCall(/*)*/ 1, // )
2)

Don't match too much (which you can do with a regex)

functionCall(1) + functionCall(2) + (2 * 3) // Don't match past the first )

but it would also have to ignore balanced parentheses

functionCall((1+(1))*(2+2))

This last one is something you can't do with a normal regex, because it involves counting parenthesis, and is generally something that regexs aren't suited for. However, it appears that .NET has ways to do this.

(And technically you would have to handle macros, I can imagine a

#define close_paren )

would ruin your day...)

That said, you could likely come up with a naive solution (similar to what you had, or what some other poster recommends) and it would work for many cases, especially if you're working with known inputs.

Daniel LeCheminant
While not suited to it, .Net Regex supports matching groups and this should be supported.
Orion Adrian
@Orion Adrian: Good point; I've added a note to that effect
Daniel LeCheminant
I cleansed the source code string by removing comments, carriage returns, and what I classified as preprocessor directives (anything starting with #). Also, I am almost certain (at least at this time) that nobody is using nested parens in the function calls we are using.
phsr
Also, the amount of calls that i need to retrieve the arguments for is a very small subset of the function calls, so I shouldn't run into any funky syntax hopefully, thanks!
phsr