I am interested in some optimization methods or general bytecode designs, which might help speed up execution using VM in comparison to interpretation of an AST.
views:
53answers:
2The main win in AST interpretation vs. bytecode is operation dispatch cost, for highly optimised interpreters this starts to become a real problem. "Dispatch" is the term used to describe the overhead required to start executing an operation (such as arithmetic, property access, etc).
A fairly normal AST based interpreter would look something like this:
class ASTNode {
virtual double execute() = 0;
}
class NumberNode {
virtual double execute() { return m_value; }
double m_value;
}
class AddNode {
virtual double execute() { return left->execute() + right->execute(); }
}
So executing the code for something as simple as 1+1
requires 3 virtual calls. Virtual calls a very expensive (in the grand scheme of things) due to the multiple indirections to make the call, and the general cost of making a call in the first place.
In a bytecode interpreter you have you a different dispatch model -- rather than virtual calls you have an execution loop, akin to:
while (1) {
switch (op.type) {
case op_add:
// Efficient interpreters use "registers" rather than
// a stack these days, but the example code would be more
// complicated
push(pop() + pop());
continue;
case op_end:
return pop();
}
}
This still has a reasonably expensive dispatch cost vs native code, but is much faster than virtual dispatch. You can further improve perf using a gcc extension called "computed goto" which allows you to remove the switch dispatch, reducing total dispatch cost to basically a single indirect branch.
In addition to improving dispatch costs bytecode based interpreters have a number of additional advantages over AST interpreters, mostly due to the ability of the bytecode to "directly" jump to other locations as a real machine would, for example imagine a snippet of code like this:
while (1) {
...statements...
if (a)
break;
else
continue;
}
To implement this correctly everytime a statement is executed you would need to indicate whether execution is meant to stay in the loop or stop, so the execution loop becomes something like:
while (condition->execute() == true) {
for (i = 0; i < statements->length(); i++) {
result = statements[i]->execute();
if (result.type == BREAK)
break;
if (result.type == CONTINUE)
i = 0;
}
}
As you add more forms of flow control this signalling becomes more and more expensive. Once you add exceptions (eg. flow control that can happen everywhere) you start needing to check for these things in the middle of even basic arithmetic, leading to ever increasing overhead. If you want to see this in the real world I encourage you to look at the ECMAScript spec, where they describe the execution model in terms of an AST interpreter.
In a bytecode interpreter these problems basically go away, as the bytecode is able to directly express control flow rather than indirectly through signalling, eg. continue
is simply converted into a jump instruction, and you only get that cost if it's actually hit.
Finally an AST interpreter by definition is recursive, and so has to be prevented from overflowing the system stack, which puts very heavy restrictions on how much you can recurse in your code, something like:
1+(1+(1+(1+(1+(1+(1+(1+1)))))))
Has 8 levels of recursion (at least) in the interpreter -- this can be a very significant cost; older versions of Safari (pre-SquirrelFish) used an AST interpreter, and for this reason JS was allowed only a couple of hundred levels of recursion vs 1000's allowed in modern browsers.