views:

636

answers:

8

I heard that some languages go from interpreted to compiled by "unrolling the interpreter loop".

Let's say I have the following pseudo-c-code interpreter for an ast tree.

int interpret(node)
{
    switch(node) {
        case PLUS:
             return interpret(child(0))+interpret(child(1));
        case MINUS:
             return interpret(child(0))-interpret(child(1));       
    }
}

How do I unroll this loop to create a compiled program?

I see you all downvoting this like I don't know what I am talking about, but here is a quote from Wikipedia that states exactly what I am describing.

"Factor was originally only interpreted, but is now fully compiled (the non-optimizing compiler basically unrolls the interpreter loop"

+3  A: 

I'm slightly confused. I don't think 'unrolling the loop' is the right term here. Even if you refactor the code to not have any recursive calls, you will still be using an interpreter.

You can compile this program with GCC. Then you will have a compiled program, albeit the compiled program will be interpreting the AST.

One way to turn this into a compiler would be, instead of doing return interpret(child(0))+interpret(child(1));, you would generate assembly instructions which would do the addition instead, and then output those to a file.

Claudiu
+2  A: 

You don't really have a loop here since not all the calls to interpret are tail calls.

The compiler closest to yours, assuming a stack model...

int compile(node)
{
    switch(node) {
        case PLUS:
             return compile(child(0))&&compile(child(1))&&compile_op(op_plus);
        case MINUS:
             return compile(child(0))&&interpret(child(1))&&compile_op(op_minus);       
    }
}

but I think unrolling in this context is more applicable to a byte code interpreter rather than an AST interpreter. The bytecode instructions are typically interpreted in a loop. Then the "unrolling" technique is to emit the code corresponding to each bytecode instruction.

Factor is similar to FORTH. Typically FORTH has an outer interpreter that generates threaded code. The threaded code can be envisioned an array of function pointers (there are several variants, direct threaded, indirect threaded, subroutine threaded, etc.). The threaded code is executed by an inner interpreter. Unrolling the interpreter in this case refers to the inner interpreter, and is a matter of concatenating the threaded code.

Doug Currie
A: 

An interpreter scans each bytecode (or AST node) at runtime and dispatch to functions calls (typically using a switch statement in an infinite loop).

A compiler does essentially the same thing, but at compile time. The compiler scans each bytecode (or AST node) at compile time and emits code (machine code or some higher-level intermediate language like C) to call the appropriate function at runtime.

cpeterso
+8  A: 

"Unrolling a loop" normally means replacing a repetition with a sequence of actions. The loop:

for (int i = 0; i < 4; ++i) {
    a[i] = b[i] + c[i];
}

would unroll into the equivalent:

a[0] = b[0] + c[0];
a[1] = b[1] + c[1];
a[2] = b[2] + c[2];
a[3] = b[3] + c[3];

It appears to me that whoever was being quoted by Wikipedia was using the phrase in a somewhat metaphorical sense. So, in that sense...

Your sample would normally be invoked inside a interpreter that is walking a tree of AST nodes, which might look something like this:

 ASSIGN
    |
 +--+---+
 |      |
REF   MINUS
 |      |
 x   +--+---+
     |      |
    VAR    PLUS
     |      |
     a   +--+--+
         |     |
        VAR  CONST
         |     |
         b     3

and the interpret function would have additional options:

int interpret(node) {
    switch(node) {
        case PLUS:
             return interpret(child(0))+interpret(child(1));
        case MINUS:
             return interpret(child(0))-interpret(child(1));       
        case ASSIGN:
             return set(child(0), interpret(child(1));
        case VAR:
             return fetch(child(0));
        case CONST:
             return value(child(0));
        ...
    }
}

If you walk the AST with that interpet function (actually performing the operations), you're interpreting. But if the function records the actions to be performed, rather than executing them, you're compiling. In pseudocode (actually, pseudocode twice, as I'm assuming a hypothetical stack machine as the compilation target):

string compile(node) {
    switch(node) {
        case PLUS:
             return(compile(child(0))) + compile(child(1)) + ADD);
        case MINUS:
             return(compile(child(0))) + compile(child(1)) + SUB);
        case ASSIGN:
             return(PUSHA(child(0))) + compile(child(1)) + STORE);
        case REF:
             return(PUSHA(child(0)));
        case VAR:
             return(PUSHA(child(0)) + FETCH);
        case CONST:
             return(PUSHLIT + value(child(0)));
        ...
    }
}

Invoking compile on that AST (ignoring any pseudocode typos ;-) would spit out something like:

PUSHA x
PUSHA a
FETCH
PUSHA b
FETCH
PUSHLIT 3
ADD 
SUB
STORE

FWIW, I'd tend to think of that as unrolling the AST, rather than unrolling the interpreter, but won't criticize somebody else's metaphor without reading it in context.

joel.neely
Recursion is a fancy loop.
strager
You could perhaps mention what kind of code you are producing here (stack machine vs. other forms of intermediate language). Otherwise, good answer.
Konrad Rudolph
@Konrad: Thanks for the suggestion. I edited to incorporate it.
joel.neely
A: 

I think what it means is that instead of looping over the statements and executing them, you loop over the statements and output the interpreter code that would have executed.

Basically, what's happening is that the code that would execute in the interpreter loop gets inlined into a new function. The loop gets "unrolled" because when the code executes, it's not in the interpreter loop anymore, it's just a linear path through the generated function.

Matthew Crumley
+1  A: 

In this article I went through an example of automatically converting an interpreter into a compiler (albeit compiling to Scheme rather than machine code). It's the same idea others have given here, but you might find it helpful to see it automated.

Darius Bacon
+1  A: 

Factory is a stack based language, not an AST based interpreter.

I've used stack based languages for actor interpreters, so this is how the one's I've made work, which may not be entirely unlike Factor.

Each function is implemented as a function which takes a stack, and returns a stack (in my case a mutated version of the same stack, I'm not sure if Factor is functional or mutating). In my interpreters, each function also puts the continuation of the function on the top of the stack, so the interpreter knows what to do next:

So the interpreter to call the next function on the stack is something like:

for (;;)
    stack = (stack[0].function_pointer)(stack);

Considering the function foo:

def foo (x,y):
   print( add(x, y) )

add could be defined as:

pop a
pop b
stack[ return_offset ] = a + b
return stack

and foo as:

pop x
pop y
push _
push &print
push y
push x
push &add

and the stack for invoking foo(5,6) would be evolve something like this at each step in the loop:

>> foo(5,6)
[&foo, 5, 6]
[&add, 5, 6, &print, _]
[&print, 11]
=> 11
[]

A simple compiler could unroll the loop for the foo function, generating the equivalent threaded code:

compiled_foo (stack): 
    stack = begin_foo(stack) // arranges stack for call to add
    stack = add(stack)
    stack = print(stack)
    return stack
Pete Kirkham
+1  A: 

This may not be related, but also check out the second Futamura projection

http://en.wikipedia.org/wiki/Futamura_projection

which says that a compiler is just an interpreter with partial evaluation/constant folding (good in theory but not practice).

Brian