views:

335

answers:

7

What techniques promote efficient opcode dispatch to make a fast interpreter? Are there some techniques that only work well on modern hardware and others that don't work well anymore due to hardware advances? What trade offs must be made between ease of implementation, speed, and portability?

I'm pleased that Python's C implementation is finally moving beyond a simple switch (opcode) {...} implementation for opcode dispatch to indirect threading as a compile time option, but I'm less pleased that it took them 20 years to get there. Maybe if we document these strategies on stackoverflow the next language will get fast faster.

+3  A: 

Just-in-time compilation is one.

Jason S
I am intrigued by this "JIT" compilation, but what does it buy me? Will the program begin execution as quickly as an interpreter that does not do JIT? Will it use more memory? Should I JIT the entire program, or only the hot spots? Do I have to do extra work to run on ARM as well as x86?
joeforker
I may have misinterpreted your question... I don't know anything about how to actually do JIT, just that it's now used a lot in bytecode and other interpreters (Java and Javascript and MATLAB all use it).
Jason S
I'm looking for the name of the technique, a short explanation, and a reason why you might choose that particular technique. JIT uses more memory, probably takes longer to start up, but it can be super fast. Biggest drawback is that the JIT is platform dependent.
joeforker
+2  A: 

Befor you start anything, check Lua.

It's small (150Kb), pure ANSI C, works on anything having C compiler. Very fast.

And most important - source code is clean and readable. Worth checking out.

dmajkic
lua is wonderful. I especially enjoyed the paper about their implementation of number- and hash- indexed arrays in the same data structure. Put zero-based arrays in the next one and I'm onboard :-)
joeforker
A: 

Benchmarking is a good technique on making anything fast on given platform(s). Test, refine, test again, improve.

I don't think you can get any better answer. There's lot of techniques to make interpreters. But I give you a tip, do not do trade offs, just choose what you really need and pursue those goals.

Cheery
Trade-offs are unavoidable. C is fast but you can only run the binary on one platform. Java runs on lots of platforms but needs a big smart runtime.
joeforker
+1  A: 

One big win is to store the source code in an intermediate form, rather than redoing lexical analysis and parsing during execution.

This can range all the way from just storing the tokens, through Forth style threaded code and on to JIT compilation.

Darron
+1  A: 

The question is a bit vague. But, it seems you're asking about writing an interpreter.

Interpreters typically utilize traditional parsing components: lexer, parser, and abstract syntax tree (AST). This allows the designer to read and interpret valid syntax and build a tree structure of commands with associated operators, parameters, etc.

Once in AST form, the entire input is tokenized and the interpreter can begin executing by traversing the tree.

There are many options, but I've recently used ANTLR as a parser generator that can build parsers in various target languages, including C/C++ and C#.

spoulson
+2  A: 

Indirect threading is a strategy where each opcode implementation has its own JMP to the next opcode. The patch to the Python interpreter looks something like this:

add:
    result = a + b;
    goto *opcode_targets[*next_instruction++];

opcode_targets maps the instruction in the language's bytecode to the location in memory of the opcode implementation. This is faster because the processor's branch predictor can make a different prediction for each bytecode, in contrast to a switch statement that has only one branch instruction.

The compiler must support computed goto for this to work, which mostly means gcc.

Direct threading is similar, but in direct threading the array of opcodes is replaced with pointers to the opcode implentations like so:

goto *next_opcode_target++;

These techniques are only useful because modern processors are pipelined and must clear their pipelines (slow) on a mispredicted branch. The processor designers put in branch prediction to avoid having to clear the pipeline as often, but branch prediction only works for branches that are more likely to take a particular path.

joeforker
true for processors that do branch prediction, including x86, x86_64, and so on.
Cheery
@Cheery it's not the architecture but the specific processor model that includes branch prediction, but you would be hard pressed to be reading stackoverflow with a processor that does not implement it.
joeforker
+2  A: 

There are a number of papers on different kinds of dispatch:

M. Anton Ertl and David Gregg, Optimizing Indirect Branch Prediction Accuracy in Virtual Machine Interpreters, in Proceedings of the ACM SIGPLAN 2003 Conference on Programming Language Design and Implementation (PLDI 03), pp. 278-288, San Diego, California, June 2003.

M. Anton Ertl and David Gregg, The behaviour of efficient virtual machine interpreters on modern architectures, in Proceedings of the 7th European Conference on Parallel Computing (Europar 2001), pp. 403-412, LNCS 2150, Manchester, August 2001.

An excellent summary is provided by Yunhe Shi in his PhD thesis.

Also, someone discovered a new technique a few years ago which is valid ANSI C.

Paul Biggar