views:

629

answers:

7

How does a virtual machine generate native machine code on the fly and execute it?

Assuming you can figure out what are the native machine op-codes you want to emit, how do you go about actually running it?

Is it something as hacky as mapping the mnemonic instructions to binary codes, stuffing it into an char* pointer and casting it as a function and executing?

Or would you generate a temporary shared library (.dll or .so or whatever) and load it into memory using standard functions like LoadLibrary ?

+1  A: 

As far as i know it compiles everything in memory because it has to run some heuristics to to optimize the code (i.e.: inlining over time) but you can have a look at the Shared Source Common Language Infrastructure 2.0 rotor release. The whole codebase is identical to .NET except for the Jitter and the GC.

Erick Sgarbi
+6  A: 

You can just make the program counter point to the code you want to execute. Remember that data can be data or code. On x86 the program counter is the EIP register. The IP part of EIP stands for instruction pointer. The JMP instruction is called to jump to an address. After the jump EIP will contain this address.

Is it something as hacky as mapping the mnemonic instructions to binary codes, stuffing it into an char* pointer and casting it as a function and executing?

Yes. This is one way of doing it. The resulting code would be cast to a pointer to function in C.

cdv
+1  A: 

As well as Rotor 2.0 - you could also take a look at the HotSpot virtual machine in the OpenJDK.

Christopher Dolan
+6  A: 

Is it something as hacky as mapping the mnemonic instructions to binary codes, stuffing it into an char* pointer and casting it as a function and executing?

Yes, if you were doing it in C or C++ (or something similar), that's exactly what you'd do.

It appears hacky, but that's actually an artifact of the language design. Remember, the actual algorithm you want to use is very simple: determine what instructions you want to use, load them into a buffer in memory, and jump to the beginning of that buffer.

If you really try to do this, though, make sure you get the calling convention right when you return to your C program. I think if I wanted to generate code I'd look for a library to take care of that aspect for me. Nanojit's been in the news recently; you could look at that.

+4  A: 

Yup. You just build up a char* and execute it. However, you need to note a couple details. The char* must be in an executable section of memory and must have proper alignment.

In addition to nanojit you can also check out LLVM which is another library that's capable of compiling various program representations down to a function pointer. It's interface is clean and the generated code tends to be efficient.

+1  A: 

About generating a DLL: the additional required I/O for that, plus linking, plus the complexity of generating the DLL format, would make that much more complicate, and above all they'd kill performance; additionally, in the end you still call a function pointer to the loaded code, so... Also, JIT compilation can happen one method at a time, and if you want to do that you'd generate lots of small DLLs.

About the "executable section" requirement, calling mprotect() on POSIX systems can fix the permissions (there's a similar API on Win32). You need to do that for a big memory segment instead that once per method since it'd be too slow otherwise.

On plain x86 you wouldn't notice the problem, on x86 with PAE or 64bit AMD64/Intel 64 bit machines you'd get a segfault.

Blaisorblade
+1  A: 

Is it something as hacky as mapping the mnemonic instructions to binary codes, stuffing it into an char* pointer and casting it as a function and executing?

Yes that works-

To do this in windows you must set PAGE_EXECUTE_READWRITE to the allocated block

void (MyFunc)() = (void ()()) VirtualAlloc(NULL, sizeofblock, MEM_COMMIT, PAGE_EXECUTE_READWRITE);

//Now fill up the block with executable code and issue-

MyFunc();

Maruf Maniruzzaman