views:

165

answers:

4

I am currently in the process of writing a compiler and I seem to have run into some problems getting it to output code that executes in a decent timeframe.

A brief overview of the compiler:

7Basic is a compiler that aims to compile 7Basic code directly into machine code for the target architecture / platform. Currently 7Basic generates x86 assembly given a source file.

The problem is that the assembly code generated by the compiler is slow and inefficient.

For example, this code (which compiles down to this assembly code) takes nearly 80.47 times longer to execute than the equivalent C code.

Part of the problem is that the compiler is generating code like the following:

push eax
push 5000000
pop ebx
pop eax

Instead of the more logical:

mov ebx,5000000

...which accomplishes the same thing.

My question is: what are some techniques to avoid this sort of problem? The parser basically uses recursion to parse the expressions, so the code generated reflects this.

+8  A: 

One technique is called peephole optimisation. This takes an iterative approach to cleaning up assembly code. Essentially you scan through the assembly code, looking at only two or three instructions at a time, and see whether you can reduce them into something simpler. For example,

push eax        ; 1
push 5000000    ; 2
pop ebx         ; 3
pop eax         ; 4

The first step would look at lines 2 and 3, and replace that with:

push eax        ; 1
mov ebx,5000000 ; 2a
pop eax         ; 4

Second, you might consider 1 and 4, and if eax is not touched in the middle instruction, remove them both, leaving what you want:

mov ebx,5000000 ; 2a
Greg Hewgill
+1: beat me to it...
Jerry Coffin
Okay, could this be done as the code is being generated? That would be better.
George Edison
Usually peephole optimisation is run as a separate pass after you've generated an intermediate assembly output. If you're compiling for multiple architectures, then it would necessarily have to be run *after* you've compiled to an IL form, and then to your target assembly language.
Greg Hewgill
Well, I kind of have a class hierarchy for the assembly code output module, so each architecture's output module derives from a base class. Aren't there some optimizations common to all of my supported architectures (currently planning to include x86 and x86-64)?
George Edison
@Greg: Crenshaw showed one method for doing it in-line. Probably not as globally efficient as another pass, but tradeoffs *are* the name of the game.
dmckee
+4  A: 

You might want to consider generating C code rather than assembly and then let a C compiler (e.g. gcc) handle the code generation for you. There's no point trying to re-invent the wheel.

Paul R
Eventually the compiler is going to generate machine code, so this is not an option.
George Edison
Eventually the C compiler is going to generate machine code, too.
I. J. Kennedy
What I meant was that eventually the compiler will directly generate the machine code itself.
George Edison
If your compiler is going to have a suitable open source license then you could possibly integrate some of the gcc back end, if you don't want to use it as part of a tool chain.
Paul R
+1  A: 

There are any number of reasons a particular code generator may emit the instruction sequence you list. The most likely is that the code generator you're using just isn't trying very hard to emit optimum code.

This pattern of emitted code suggests to me that your code generator doesn't know that the x86 has "mov immediate" instructions that embed the constant value into the instruction stream directly. The x86 encoding for opcodes with immediate values can get a little complicated (variable length R/M bytes) but this is already required if you want to use many of the x86 instructions.

This emitted code also suggests that the code generator doesn't know that EAX is not modified by the EBX instructions. This feels like the codegen is template driven rather than discrete logic.

This kind of codegen happens when the compiler's internal intermediate representation of operations is not detailed enough to represent all the facets of the target architecture. This is particularly true if the code generator architecture was originally designed for a RISC instruction set but has been repurposed to emit x86 instructions. RISC architecture tend to have very few and very simple load, store, and operate reg/reg instructions, whereas the x86 instruction set has evolved organically over decades to include a wide variety of opcodes that operate directly on memory, inline constants into the instructions, and a whole mess of other stuff. If the compiler's intermediate representation (expression graph) is wired for RISC, it will be difficult to make it grok the wide variety and subtleties of x86.

dthorpe
Actually I wrote the code generater :)
George Edison
Cool. Then there is hope that this codegen can be improved. ;> Step 1: figure out how to recognize constant value loads in your intermediate representation and emit them as mov reg,imm. Step2: figure out why your code generator is pushing and popping eax in this example, since it is not relevant to the core operation at all. Smells of bug.
dthorpe
@dthorpe: It's not a bug. It's supposed to do that simply because of the way expressions are evaluated. This is why I asked the question.
George Edison
Well then you need to work on the way your expressions are evaluated if its adding cruft to your codegen. ;> Peephole optimizations (as mentioned in another answer) can help clean up the mess left by a poor codegen, but in my opinion it's better to emit better code to begin with.
dthorpe
The compiler is not supposed to have any understanding of the architecture for the generated machine code. Otherwise it will be impossible to add another architecture.
George Edison
+1  A: 

I'm taking a compiler course at the moment. I've made some great progress in outputting efficient code, but you should look into the dragon book. It is a rite of passage. You should take a look at the code from Jeremy Bennett's book, Introduction to Compiling Techniques: A First Course Using ANSI C, LEX, and YACC. The book itself is very hard to find, but you can download the source code for the compiler free from

http://www.jeremybennett.com/publications/download.html

The code generator file (cg.c) has some functions for generating fairly optimized code. The target language isn't i386, but you should consider looking at how he describes registers and keeps track of where symbol table entries are stored. His output assembly could be further optimized, but it provides a great base for producing code that could rival the output from gcc -S in some regards.

One general optimization would be to subtract the stack pointer to reserve space for all local and temporary variables upon entering a function. Then just reference the offsets instead of constantly pushing/popping.

For example, if your intermediate code is a list of quadruples, you should simply iterator through it for each function and keep track of the maximum offset. Then output the line to subtract the amount of space on the stack. This removes the need to push so many variables on and off. To remove the need to pop them, you can simply mov their value from their offset on the stack into a register. This will significantly improve performance.

Kizaru
Great advice - the language doesn't have the concept of scope yet, nor does it have functions / subroutines. Still a work in progress. But when it does, I'll be sure to have local variables go on the stack.
George Edison
What is your intermediate code representation? TAC/Quadruples?
Kizaru
Don't have one :) The compiler sends 'pseudo-commands' to the output module which generates the exact assembly instructions.
George Edison
You should reconsider your strategy--you should generate some sort of intermediate code so you can make some necessary optimizations before generating target code.
Kizaru