views:

459

answers:

3

Presently I have a some legacy code, which generates the op code. If the code has more number of macros then the code generation takes so much of time (In terms of hours!!). I have gone through the logic, they are handling the macro by searching for it and doing a replace of each variable in it some thing like inlining.
Is there a way that I can optimize it without manipulating the string?

A: 

I have an application which has its own grammer. It supports all types of datatypes that a typical compiler supports (Even macros). More precisely it is a type of compiler which generates the opcodes by taking a program (which is written using that grammer) as input. For handling the macros, it uses the text replacement logic For Example:

Macro Add (a:int, b:int)

int c = a + b

End Macro

// Program Sum

..

int x = 10, y = 10;

Add(x, y);

..

// End of the program

After replacement it will be

// Program Sum

..

int x = 10, y = 10;

int c = x + y

..

// End of program

This text replacement is taking so much of time i.e., replacing the macro call with macro logic. Is there a optimal way to do it?

Vinay
If you take your whole file in a std::String, then it's no wonder replacing macros in a huge file takes time. maybe have each line of code as an element in a deque. replacing is much facter then i suppose
Johannes Schaub - litb
A: 

This is really hard to answer without knowing more of your preprocessor/parse/compile process. One idea would be to store the macro names in a symbol table. When parsing, check text tokens against that table first, If you find a match, write the replacement into a new string, and run that through the parser, then continue parsing the original text following the macrto's close parens.

Depending on your opcode syntax, another idea might be - when you encounter the macro definition while parsing, generate the opcodes, but put placeholders in place of the arguments. Then when the parser encounter calls to the macro, generate the code for evaluating the arguments, and insert that code in place of the placeholders in the pre-generated macro code.

AShelly
It does two iterations. First iteration scans for macros and does the text replacement and constructs the tree. In the second iteration it generates the opcode by taking this tree as input.
Vinay
+1  A: 

You must tokenize your input before starting this kind of process. (I can't recommend the famous Dragon Book highly enough - even the ancient edition stood the test of time, the updated 2006 version looks great). Compiling is the sort of job that's best split up into smaller phases: if your first phase performs lexical analysis into tokens, splitting lines into keywords, identifiers, constants, and so on, then it's much simpler to find the references to macros and look them up in a symbol table. (It's also relatively easier to use a tool like lex or flex or one of their modern equivalents to do this job for you, than to attempt to do it from scratch).

The 'clue' seems to be if the code has more number of macros then the code generation takes so much of time. That sounds like the process is linear in the number of macros, which is certainly too much. I'm assuming this process occurs one line at a time (if your language allows that, obviously that has enormous value, since you don't need to treat the program as one huge string), and the pseudocode looks something like

for(each line in the program)
{
    for(each macro definition)
    {
        test if the macro appears;
        perform replacement if needed;
    }
}

That clearly scales with the number of macro definitions.

With tokenization, it looks something like this:

for(each line in the program)
{
    tokenize the line;
    for(each token in the line)
    {
        switch(based on the token type)
        {
            case(an identifier)
                lookup the identifier in the table of macro names;
                perform replacement as necessary;
            ....
        }
    }
}

which scales mostly with the size of the program (not the number of definitions) - the symbol table lookup can of course be done with more optimal data structures than looping through them all, so that no longer becomes the significant factor. That second step is something that again programs like yacc and bison (and their more modern variants) can happily generate code to do.

afterthought: when parsing the macro definitions, you can store those as a token stream as well, and mark the identifiers that are the 'placeholder' names for parameter replacement. When expanding a macro, switch to that token stream. (Again, something things like flex can easily do).

Chris