views:

588

answers:

3

The deceptively simple foundation of dynamic code generation within a C/C++ framework has already been covered in another question. Are there any gentle introductions into topic with code examples?

My eyes are starting to bleed staring at highly intricate open source JIT compilers when my needs are much more modest.

Are there good texts on the subject that don't assume a doctorate in computer science? I'm looking for well worn patterns, things to watch out for, performance considerations, etc. Electronic or tree-based resources can be equally valuable. You can assume a working knowledge of (not just x86) assembly language.

+2  A: 

I'm not aware of any sources specifically related to JITs, but I imagine that it's pretty much like a normal compiler, only simpler if you aren't worried about performance.

The easiest way is to start with a VM interpreter. Then, for each VM instruction, generate the assembly code that the interpreter would have executed.

To go beyond that, I imagine that you would parse the VM byte codes and convert them into some sort of suitable intermediate form (three address code? SSA?) and then optimize and generate code as in any other compiler.

For a stack based VM, it may help to to keep track of the "current" stack depth as you translate the byte codes into intermediate form, and treat each stack location as a variable. For example, if you think that the current stack depth is 4, and you see a "push" instruction, you might generate an assignment to "stack_variable_5" and increment a compile time stack counter, or something like that. An "add" when the stack depth is 5 might generate the code "stack_variable_4 = stack_variable_4+stack_variable_5" and decrement the compile time stack counter.

It is also possible to translate stack based code into syntax trees. Maintain a compile-time stack. Every "push" instruction causes a representation of the thing being pushed to be stored on the stack. Operators create syntax tree nodes that include their operands. For example, "X Y +" might cause the stack to contain "var(X)", then "var(X) var(Y)" and then the plus pops both var references off and pushes "plus(var(X), var(Y))".

Glomek
+4  A: 

Well a pattern I've used in emulators goes something like this:

typedef void (*code_ptr)();
unsigned long instruction_pointer = entry_point;
std::map<unsigned long, code_ptr> code_map;


void execute_block() {
    code_ptr f;
    std::map<unsigned long, void *>::iterator it = code_map.find(instruction_pointer);
    if(it != code_map.end()) {
     f = it->second
    } else {
     f = generate_code_block();
     code_map[instruction_pointer] = f;
    }
    f();
    instruction_pointer = update_instruction_pointer();
}

void execute() {
    while(true) {
     execute_block();
    }
}

This is a simplification, but the idea is there. Basically, every time the engine is asked to execute a "basic block" (usually a everything up to next flow control op or whole function in possible), it will look it up to see if it has already been created. If so, execute it, else create it, add it and then execute.

rinse repeat :)

As for the code generation, that gets a little complicated, but the idea is to emit a proper "function" which does the work of your basic block in the context of your VM.

EDIT: note that I haven't demonstrated any optimizations either, but you asked for a "gentle introduction"

EDIT 2: I forgot to mention one of the most immediately productive speed ups you can implement with this pattern. Basically, if you never remove a block from your tree (you can work around it if you do but it is way simpler if you never do), then you can "chain" blocks together to avoid lookups. Here's the concept. Whenever you return from f() and are about to do the "update_instruction_pointer", if the block you just executed ended in either a call, unconditional jump, or didn't end in flow control at all, then you can "fixup" its ret instruction with a direct jmp to the next block it'll execute (cause it'll always be the same one) if you have already emited it. This makes it so you are executing more and more often in the VM and less and less in the "execute_block" function.

Evan Teran
Nice pattern, makes sense, thanks.
postfuturist
No problem, if you want some "simple" examples, look into the emulation community. They tend to call it "dynamic recompilation", but it really is the same exact concept as JIT (except dyn-rec makes little to no effort to *improve* the code it is given to translate).
Evan Teran
A: 

Get yourself a copy of Joel Pobar's book on Rotor (when it's out), and delve through the source to the SSCLI. Beware, insanity lies within :)

OJ
you just got my most excited attention for this "rotor" thing.
Hugo
It's a great thing to play with. Seriously.
OJ