views:

273

answers:

3

Hello, I'm developing a scripting language that compiles for its own virtual machine, a simple one that has instructions to work with some kind of data like points, vectors, floats and so on.. the memory cell is represented in this way:

struct memory_cell
{
    u32 id;
    u8 type;

    union
    {
     u8 b; /* boolean */
     double f; /* float */
     struct { double x, y, z; } v; /* vector */
     struct { double r, g, b; } c; /* color */
     struct { double r, g, b; } cw; /* color weight */
     struct { double x, y, z; } p; /* point variable */
     struct { u16 length; memory_cell **cells; } l; /* list variable */
    }; 
};

Instructions are generic and able to work on many different operands. For example

ADD dest, src1, src2

can work with floats, vectors, points, colors setting the right type of destination according to operands.

The main execution cycle just check the opcode of the instruction (which is a struct containing unions to define any kind of instruction) and executes it. I used a simplified approach in which I don't have registers but just a big array of memory cells.

I was wondering if JIT could help me in getting best performances or not and how to achieve it.

As I said the best implementation reached so far is something like that:

 void VirtualMachine::executeInstruction(instr i)
 {
     u8 opcode = (i.opcode[0] & (u8)0xFC) >> 2;

     if (opcode >= 1 && opcode <= 17) /* RTL instruction */
     {
        memory_cell *dest;
        memory_cell *src1;
        memory_cell *src2;

        /* fetching destination */
        switch (i.opcode[0] & 0x03)
        {
            /* skip fetching for optimization */
            case 0: { break; }
            case MEM_CELL: { dest = memory[stack_pointer+i.rtl.dest.cell]; break; }
            case ARRAY_VAL: { dest = memory[stack_pointer+i.rtl.dest.cell]->l.cells[i.rtl.dest.index]; break; }
            case ARRAY_CELL: { dest = memory[stack_pointer+i.rtl.dest.cell]->l.cells[(int)i.rtl.dest.value]; break; }
        }

     /* omitted code */

     switch (opcode)
     {
         case ADD:
         {
             if (src1->type == M_VECTOR && src2->type == M_VECTOR)
             {
                 dest->type = M_VECTOR;
                 dest->v.x = src1->v.x + src2->v.x;
                 dest->v.y = src1->v.y + src2->v.y;
                 dest->v.z = src1->v.z + src2->v.z;
              }

      /* omitted code */

Is it easy/convenient to try jit compilation? But I really don't know where to start from, that's why I'm asking some advices.

Apart from that, are there any other advices I should consider in developing it?

This virtual machine should be enough fast to do calculate shaders for a ray tracer but I sill haven't done any kind of benchmark.

+7  A: 

A VM is a big task to consider. Have you considered basing your VM on something like LLVM?

LLVM will provide a good base to start from and there are plenty of example projects which you can use for understanding.

Jeff Foster
+5  A: 

Before writing a JIT ("Just-in-time") compiler, you should at least consider how you would write a "Way-ahead-of-time" compiler.

That is, given a program consisting of instructions for your VM, how would you produce a program consisting of x86 (or whatever) instructions, that does the same as the original program? How would you optimise the output for different instruction sets, and different versions of the same architecture? The example opcode you've given has quite a complicated implementation, so which opcodes would you implement "inline" by just emitting code that does the job, and which would you implement by emitting a call to some shared code?

A JIT has to be able to do this, and it also has to make decisions while the VM is running about which code it does it to, when it does it, and how it represents the resulting mixture of VM instructions and native instructions.

If you're not already an assembly-jockey, then I don't recommend writing a JIT. That's not to say "don't do it ever", but you should become an assembly-jockey before you start in earnest.

An alternative would be to write a non-JIT compiler to convert your VM instructions (or the original scripting language) to Java bytecode, or LLVM, as Jeff Foster says. Then let the toolchain for that bytecode do the difficult, CPU-dependent work.

Steve Jessop
+1  A: 

Steve Jessop has a point: JIT compiler is way harder then normal compiler. And normal compiler is hard by itself.

But, reading the last part of question, I wonder if you really want a JIT compiler.

If your problem is like this:

I want to create a ray tracing program which allows user to provide their shader procedures etc. using my own domain specific language. It goes OK. I have my language defined, interpreter implemented and it works nice and correctly. But it's slow: How can I execute it as native code?

Then here's what I used to be doing is similar situations:

  • Translate your user provided procedures to C functions that can be called from your program.

  • Write them out to normal C source file with proper #includes etc.

  • Compile them as .dll (or .so in *nix) using normal C compiler.

  • Load .dll dynamically in your program, find out your functions pointers and use them in your ray tracer in place of interpreted versions.

Some notes:

  • In some environments it might be impossible: no access to C compiler or system policy that forbids you to load your own dll. So check before you try it.

  • Do not discard your interpreter. Keep it as reference implementation of your language.

Tomek Szpakowicz
"(dumb but it happens)". Seems a bit weird to have a C compiler but no dynamic linking. But not having a C compiler at all is pretty common, if you consider that most code doesn't run on PCs...
Steve Jessop
@Steve: I guess I'll remove this comment. It was about restricting right to using your own code (exe's, dll's etc) as system policy, not lack of compiler.I know it happens.Anyway, if user cannot load its own (as opposed to installed by admin) code, program incorporating JIT needs to operate with somehow elevated privileges as well.In some environments you won't be able to execute block of data as binary code (protection against buffer overflows etc.) so you'll need to load it as shared libraries anyways.
Tomek Szpakowicz
Yes, or depending on the system it might be the other way round - JITs can allocate executable memory with mid-level privileges, but it requires digital signing (or kernel-level privileges) to authorise a dll to be loaded. I can only assume the assumption is that anyone smart enough to write a JIT and demonstrate it working, is smart enough not to write malicious code into memory and execute it. Whereas any fool can load a library, and hence shouldn't be allowed to ;-)
Steve Jessop
@Steve: And anyone smart enough to create a nuclear bomb is also smart enough not to do this... Egh... Wait... Hmm... Damn it!
Tomek Szpakowicz
Yeah, I'm not saying the reasoning is clinical, just that they must feel something along those lines - the risk/reward ratio for one is lower than the other.
Steve Jessop