tags:

views:

2058

answers:

6

I will implement a virtual machine in x86 and I wonder what kind of design would yield best results. What should I concentrate on to squish out the juice? I will to implement the whole virtual machine in x86 assembly.

I haven't much instructions and I can choose their form. The instructions project directly into smalltalk's syntax in blocks. I give out the instruction design I were thinking of:

^ ...       # return
^null     # return nothing
object    # address to object
... selector: ... # message pass (in this case arity:1 selector: #selector:)
var := ... # set
var # get

The sort of VM I were thinking about:

mov eax, [esi]
add esi, 2
mov ecx, eax
and eax, 0xff
and ecx, 0xff00 # *256
shr ecx, 5          # *8
jmp [ecx*4 + operations]
align 8:
    operations:
dd retnull
dd ret
# so on...
    retnull:          # jumps here at retnul
# ... retnull action
    ret:
# ... ret action
#etc.

Don't start asking why I need yet another virtual machine implementation. Interpretive routines aren't stock stuff you just pick up whenever you need them. Most virtual machines you are proposing elsewhere are weighted towards portability with the cost of performance. My goal is not the portability, my goal is the performance.

The reason this interpreter is needed at all is because smalltalk blocks doesn't end up gotten interpreted the same way:

A := B subclass: [
    def a:x [^ x*x]
    clmet b [...]
    def c [...]
    def d [...]
]

[ 2 < x ] whileTrue: [...]

(i isNeat) ifTrue: [...] ifFalse: [...]

List fromBlock: [
    "carrots"
    "apples"
    "oranges" toUpper
]

I need the real benefit coming from the interpretive routines, that is the choice of context where to read the program in. Of course, good compiler should just most of the time compile the obvious cases like: 'ifTrue:ifFalse' or 'whileTrue:', or the list example. The need for interpreter doesn't just disappear because you always may hit a case where you can't be sure the block gets the treatment you expect.

+1  A: 

The point of an interpreter is portability, most of the time. The fastest approach I can think of is to generate x86 code in memory directly, just like JIT compilers do, but then, of course, you don't have an interpreter anymore. You have a compiler.

However, I'm not sure writing the interpreter in assembler will give you the best performance (unless you're an assembler guru and your project is very limited in scope). Using a higher-level language can help you focus on better algorithms for, say, symbol lookup and register allocation strategies.

Carl Seleborg
The point of interpreters like java VM or .NET VM is the portability with the cost of performance. Interpretive routines have many other uses.
Cheery
A: 

I have to ask, why create a virtual machine with a focus on performance? Why not just write x86 code directly? Nothing can possibly be faster.

If you want a very fast interpreted language, look at Forth. Their design is very tidy and very easy to copy.

S.Lott
Because I need to interpret the instructions in variety of ways. The other reason over portability I could think of would be the need to be really convenient or save space with more specific instructions. It all is just stuff you can't easily turn into a library, except in the form of vm-generators.
Cheery
@Cheery: "My goal is not the portability" vs. "The other reason over portability I could think of". So portability is both not important, and is the first consideration. "save space with more specific instructions" is what compilers do best.
S.Lott
I'm not interested about portability at all, it's just most commonly known use of VMs, I listed what you could do with a VM. And no, compilers don't do that, at least not in this context. Think about the sum -function (in python) as an interpretive routine. This is similar.
Cheery
+1  A: 

If you want something really fast, try using LLVM. It can generate native code for most processors from a high level program description. You can either go with your own assembly language or generate the llvm structure skipping the assembly phase, depending on what you find most convenient.

I'm not sure if it's the best for your problem, but it's definitely what I would use if I would do some performance critical execution of code that can't be compiled with the rest of the program.

Laserallan
A: 

If you does not like JIT and your goal is not the portability. I think you may get interested in Google NativeClient project. They do static analyst, sandboxing and others. They allow the host execute RAW x86 instructions.

Dennis Cheung
+2  A: 

I see there is some confusion about portability here, so I feel obliged to clarify matters some. These are my humble opinions so you are, of course, free to object against them.

I assume you came accross http://www.complang.tuwien.ac.at/forth/threading/ if you consider writing a VM seriously, so I won't dwell upon the described techniques.

Already mentioned, targeting a VM has some advantages such as reduced code size, reduced compiler complexity (often translates to faster compilation), portability (note that the point of a VM is portability of the language, so it doesn't matter if the VM itself is not portable).

Considering dynamic nature of your example, your VM will resemble a JIT compiler more than other more popular ones. So, altough S.Lott missed the point in this case, his mentioning of Forth is very on the spot. If I were to design a VM for a very dynamic language, I would separate interpretation into two stages;

  1. A producer stage which consults an AST stream on demand and transforms it to a more meaningful form (for example, taking a block, deciding whether it should be executed right away or stored in somewhere for later execution) possibly introducing new kinds of tokens. Essentially you recover context sensitive information that may be lost in parsing here.

  2. A consumer stage fetching the generated stream from 1 and executes it blindly like any other machine. If you make it Forth like, you can just push a stored stream and be done with it instead of jumping instruction pointer around.

As you say, just mimicking how the damn processor work in another way doesn't accomplish any dynamism (or any other feature worth a damn, like security) that you require. Otherwise, you would be writing a compiler.

Of course, you can add arbitrarily comlex optimizations in stage 1.

artificialidiot
+1  A: 

you can speed up your dispatch routine with an unencoded instruction set to:

mov eax, [esi]
add esi, 4
add eax, pOpcodeTable
jmp eax

which should have an overhead < 4 cycles for each dispatch on cpu's > Pentium 4.

As addition, for performance reasons it is better to increment ESI (IP) in each primitive routine because the chances are high that the incrementation can be paired with other instructions:

mov eax, [esi]
add eax, pOpcodeTable
jmp eax

~ 1-2 cylces overhead.

Mat