views:

101

answers:

3

For example, there is the source:

void func1() {
    func3();
    if(qqq) {
         func2();
    }
    func4(
    );
}

It should be transformed to:

void func1() {
MYMACRO
    func3();
MYMACRO
    if(qqq) {
MYMACRO
         func2();
MYMACRO
    }
MYMACRO
    func4(
    );
MYMACRO
}

I.e. to insert "MYMACRO\n" at the end of each line where statement can be, only inside functions.

How to do it easily? Should I use regular expressions? What tools should I use?

For example, can gcc output all line numbers of all statement begins (or ends) inside functions?

@related http://stackoverflow.com/questions/3982868/how-to-tell-gcc-to-instrument-the-code-with-calls-to-my-own-function-each-line

@related http://stackoverflow.com/questions/3992587/what-profiler-should-i-use-to-measure-real-time-including-waiting-for-syscalls

+1  A: 

What are you trying to accomplish by doing this? Based on the description of the task, there is probably a much easier way to approach the problem. If you're sure that this is the best way to accomplish your task, read on.


You would have to implement some sort of rudimentary C language parser to do this. Since you are processing text, I would recommend using a scripting language like perl, python, or ruby to modify your text instead of writing a C program to do it.

Your parser will walk through the file a line at a time and for each line, it will determine whether it needs to insert your macro. The parser will need to keep track of a number of things. First, it needs to keep track of whether or not it is currently inside of a comment. When you encounter a /* sequence, set a "in comment" flag and clear it the next time you encounter a */ sequence. Whenever that flag is set, you will not add a macro invocation. Also, you will need to keep track of whether or not you are inside a function. Assuming your code is fairly simple and straightforward, you can have a "brace counter" that starts at zero, increments whenever you encounter a {, and decrements whenever you encounter a }. If your brace counter is zero, then you are not inside of a function and you shouldn't add a macro call. You will also want to add special code to detect and ignore braces that are part of a structure definition, array initializer, etc. Note that simple brace counting won't work if your code does more complicated things like:

void some_function (int arg) {
#ifdef CHECK_LIMIT_ONLY
    if (arg == 0) {
#else
    if (arg < 10) {
#endif
        // some code here
        ...
    }
}

While you could argue that snippet is simply a case of poorly-written code, it's just an example of the type of problem that you can run into. If your code has something in it that breaks simple brace counting, then this problem just got significantly more difficult. One way to tell if your code will break brace counting is if you reach the end of the file with a non-zero brace count or if at any point in time the brace count goes negative.

Once you can determine when you are in a function and not in a comment, you need to determine whether the line needs a macro inserted after it. You can start with a few simple rules, test the script, and see if there are any cases that it missed. For starters, any line ending in a semicolon is the end of a statement and you will want to insert a macro after it. Similar to counting braces, when you are inside of a function you will want to count parenthesis so that you can determine if you are inside of a function call, loop conditional, or other compound statement. If you are inside one of these, you will not add the macro. The other code location to track is the the start and end lines of a { ... } block. If a line ends in { or }, you will add a macro after it.

For a complicated task like this, you will definitely want to script something up, try it out on a relatively simple piece of code, and see what it gets wrong. Make adjustments to cover the cases you missed the first time and re-test. When it can parse the simple code correctly, give it something more complicated and see how well it does.

''Update:'' To address the concerns that some people have expressed regarding the additional latency of adding print commands, remember that you don't have to print a timestamp at every macro call. Instead, have the macro call grab a timestamp and stick it onto a list. Once your program is done, print all the timestamps off of the list. That way, you save all the print-related delay until after your test is over.

bta
For full generality, of course, it needs to include trigraphs. They can have significance here, such as a string like `"ab??/"cd}"` or the simple use of `??<` and `??>`, or `??=` to start preprocessor commands. Most programs don't use trigraphs, but I don't know what the OP is working with.
David Thornley
It turns out, OP is trying to profile the code line by line. Excellent answer btw, +1
JoshD
I want to abstract out of syntax details. For example, compiler knows what lines are statements. Can it just tell it?
Vi
@Vi: There may be multiple statements on one line and one statement may span multiple lines.
James McNellis
@James McNellis, Multiple statements in one line should be considered as one unit. It should handle multi-line statement and do not break it.
Vi
@Vi- If you want to treat multiple statements on a line as one unit, then my algorithm should handle it that way. If you try to get the compiler to do the instrumentation for you, it will see them as separate statements. To combat the problem David brought up, I would recommend doing a global search-and-replace to replace all trigraph sequences with their associated characters before parsing the file.
bta
A: 

rewrite your sources so the following works :-)

Instead of gcc ... file1.c file2.c ... do

gcc ... `sed -e's/;/;\nMYMACRO/' file1.c` file1extra.c \
        `sed -e's/;/;\nMYMACRO/' file2.c` file2extra.c \
    ...
pmg
How would that work with the code `struct boo { int trick; float treat; };` or `for (i = 9; i < 99 ; i++) spank(i);`
nategoose
It wouldn't work with that code ... that's the reason for rewriting the sources in the first place: remove `struct` definitions, `for`, initializations, ... and everything else that stops the "`sed` trick" from working.
pmg
I want it to work in large existing source code that I don't yet understand fully, but need to fix/hack.
Vi
A: 

Here's some quick and dirty C# code. Basically just primitive file IO stuff. It's not great, but I did whip it up in around 3 minutes. This code implies that function blocks are demarcated with a comment line of "//FunctionStart" at the beginning and "//FunctionEnd" at the end. There are more elegant ways of doing this, this is the fast/dirty/hacky approach.

Using a managed app to do this task is probably overkill, but you can do a lot of custom stuff by simply adding on to this function.

        private void InsertMacro(string filePath)
        {
            //Declrations:
            StreamReader sr = new StreamReader(filePath);
            StreamWriter sw = new StreamWriter(filePath + ".tmp");
            string line = "";
            bool validBlock = false;

            //Go through source file line by line:
            while ((line = sr.ReadLine()) != null)
            {
                if (line == "//FunctionStart")
                    validBlock = true;
                else if (line == "//FunctionEnd")
                    validBlock = false;


                sw.WriteLine(line);

                if (validBlock)
                   sw.WriteLine("MYMACRO");
            }

            //Replace legacy source with updated source:
            File.Delete(filePath);
            File.Move(filePath + ".tmp", filePath);

            //Clean up streams:
            sw.Close();
            sr.Close();
        }
kmarks2