views:

555

answers:

1

In the llvm tutorials and examples, the compiler outputs LLVM IR by making calls like this

return Builder.CreateAdd(L, R, "addtmp");

but many interpreters are written like this:

switch (opcode) {
     case ADD:
             result = L + R;
             break;
     ...

How would you extract each of these code snippets to make a JIT with LLVM without having to re-implement each opcode in LLVM IR?

+5  A: 

Okay, first take all of your code snippets and refactor them into their own functions. So your code goes to:

void addOpcode(uint32_t *result, uint32_t L, uint32_t R) {
    *result = L + R;
}

switch (opcode) {
    case ADD:
            addOpcode(&result, L, R);
            break;
     ....

Okay, so after doing this your interpreter should still run. Now take all the new functions and place them in their own file. Now compile that file using either llvm-gcc or clang, and instead of generating native code compile it using the "cpp" backend (-march -cpp). That will generate C++ code that instantiates the byte code for the compilation unit. You can specify options to limit it to specific functions, etc. You probably want to use "-cppgen module" .

Now back your interpreter loop glue together calls to the generated C++ code instead of directly executing the original code, then pass it to some optimizers and a native codegenerator. Gratz on the JIT ;-) You can see an example of this in a couple of LLVM projects, like the vm_ops in llvm-lua.

Louis Gerbarg
Wonderful! I thought it would be something like that, with LLVM inlining all the functions.
joeforker
How does this compare to a call-threaded interpreter where you only JIT a series of CALL instructions to each bytecode implementation, inline the implementation of only a few opcodes most likely BRANCH opcodes, and each opcode implementation ends with RET?
joeforker
I don't quite follow this. Are you saying to pass all the opcode functions into LLVM, and when you output it back to C, it will automatically have a JIT built in?
Unknown
You are not outputing it as C, you are outputing C++ code that instantiates the in memory byte code compiled representation of the function.
Louis Gerbarg