views:

1568

answers:

9

I've read the whole Dragon Book recently (just for fun, I'm not really planning to implement an actual compiler), and I was left with this big question dangling in my head.

What is different between implementing a compiler and an interpreter?

To me a compiler is made up of:

  • Lexer
  • Parser (which builds the syntax tree)
  • Generate Intermediate code (like 3 address code)
  • Do all these crazy things to optimize if you want :-)
  • Generate "assembly" or "native code" from the 3 address code.

Now, obviously, the interpreter also has the same lexer and parser as the compiler.
But what does it do after that?

  • Does it "read" the syntax tree and execute it directly? (kind of like having an instruction pointer pointing to the current node in the tree, and the execution is one big tree traversal plus the memory management for the call stack) (and if so, how does it do it? I'm hoping the execution is better than a huge switch statement that checks what type of node it is)

  • Does it generate 3 address code and interpret that? (if so, how does it do it? Again, I'm looking for something more elegant than a mile long switch statement)

  • Does it generate real native code, load it into memory, and make it run? (at which point I'm guessing it's not an interpreter anymore, but more like a JIT compiler)

Also, at which point does the concept of "virtual machine" cut in? What do you use a virtual machine for in a language? (to be clear about my level of ignorance, to me a virtual machine is VMWare, I have no idea how the concept of VM applies to programming languages / executing programs).

As you can see, my question is quite broad. I'm mostly looking for not only which method is used but mostly to first understand the big concepts, and then get into how it works in detail. I want the ugly, raw details. Obviously, this is more a quest for references to things to read rather than expecting you to answer all these details in here.

Thanks!
Daniel


EDIT: Thank you for your answers so far. I realized my title was misleading though. I understand the "functional" difference between a compiler and an interpreter.
What i'm looking for is the difference as to how you implement an interpreter, vs a compiler.
I understand now how a compiler is implemented, the question is how an interpreter differs from that.

For example: VB6 is clearly both a compiler and an interpreter. I understand now the compiler part. However, I can not grasp how, when running inside the IDE, it could let me stop the program at any arbitrary point, change the code, and resume execution with the new code. That's just one tiny example, it's not the answer i'm looking for. What i'm trying to understand, as I explain below, is what happens after I have a parse tree. A compiler will generate new code from it in the "target" language. What does an interpreter do?

Thank you for your help!

+4  A: 

Both have much in common (eg lexical parser) and there is disagreement on the difference. I look at this way:

The classical definition would be that a compiler parses and translates a stream of symbols into a stream of bytes that can be run by the CPU whereas an interpreter does the same thing but translates them a form that must be executed on a piece of software (eg JVM, CLR).

Yet people call 'javac' a compiler so the informal definition of a compiler is something that must be done to source code as a separate step whereas interpreters have no 'build' step (eg PHP, Perl).

cletus
I'll +1 you but there are many complexities. Eg, the python interpreter has a build step (compiles py to pyc when needed). In addition, some CPUs have microcode as their "real" language choosing to emulate the "visible" CPU on top of that).
paxdiablo
An interpreter doesn't transform the input to "a form that must be executed on a piece of software" -- it may or may not transform the input internally, but the main point is that it ends up *executing* the code! Right there and then!
j_random_hacker
Most people would view PHP as an interpreter but I assure you it compiles plaintext source to an intermediate form. After all, that's how PHP accelerators (opcode caches) like APC work.
cletus
+3  A: 

It's not as clear cut as it used to be. It used to be build a parse tree, bind it, and execute it (often binding at the last second).

BASIC was mostly done this way.

You could claim that things that run bytecode (java/.net) without doing a JIT are interpriters - but not in the traditional sense since you still have to 'compile' to bytecode.

The old school difference was: If it generates CPU code it's a compiler. If you run it directly in your editing environment and can interact with it while editing, it's an interpriter.

That was far less formal than the actual Dragon book - but I hope it's informative.

Joe
Can you provide more detail as to what you mean by "binding it"?Also, after binding, how is it executed?
Daniel Magliola
+7  A: 

A compiler is a program that translates a program in one programming language to a program in another programming language. That's it - plain and simple.

An interpreter translates a programming language into its semantic meaning.

An x86 chip is an interpreter for x86 machine language.

Javac is a compiler for java to the java virtual machine. java, the executable application, is an interpreter for the jvm.

Some interpreters share some elements of compilation in that they may translate one language into another internal language that is easier to interpret.

Interpreters usually, but not always, feature a read-eval-print loop.

plinth
read-eval-print loop: that sounds interesting, i'll do some reading on that. thanks!
Daniel Magliola
+1  A: 

If my experience indicates anything;

  1. Interpreters don't try to reduce/process AST further, each time a block of code is referenced, relevant AST node is traversed and executed. Compilers traverse a block at most several times to generate executable code in a determinate place and be done with it.
  2. Interpreters' symbol table keeps values and referenced while execution, compilers' symbol table keeps locations of variables. There is no such thing symbol table while execution.

In shot the difference may be as simple as

case '+':
    symtbl[var3] = symtbl[var1] + symtbl[var2];
    break;

between,

case '+':
    printf("%s = %s + %s;",symtbl[var3],symtbl[var1],symtbl[var2]);
    break;

(It doesn't matter if you target another language or (virtual) machine instructions.)

artificialidiot
So basically, it's tree traversal, plus an insanely huge switch statement?
Daniel Magliola
Not necessarily a visible switch if your implementation language supports classes with virtual methods. But yes, difference is what you do while traversing the tree.
artificialidiot
+2  A: 

short answer:

  • a compiler converts source-code into an executable format for later execution
  • an interpreter evaluates source-code for immediate execution

there is a great deal of leeway in how either are implemented. It is possible for an interpreter to generate native machine code and then execute that, while a compiler for a virtual machine may generate p-code instead of machine code. Threaded interpreted languages like Forth look up keywords in a dictionary and execute their associated native-code function immediately.

compilers can generally optimize better because they have more time to study the code and produce a file for later execution; interpreters have less time to optimize because they tend to execute the code "as is" upon first sight

an interpreter that optimized in the background, learning better ways to execute the code is also possible

summary: the difference really comes down to 'prepare the code for later execution' or 'execute the code right now'

Steven A. Lowe
Now, if an interpreter generates machine code and executes it, how is that different from a compiler? Is it just that it compiles every single time?
Daniel Magliola
Also, do you have more information as to how an interpreter is implemented? Something i can read?Thanks!
Daniel Magliola
@[Daniel Magliola]: re #1 yes, it compiles every single time and does not output the results of compilation for reust; few interpreters actually do this though, it's very expensive to redo all that work every time
Steven A. Lowe
@[Daniel Magliola]: re #2 the core of an interpreter is the 'evaluate' or 'execute' function that executes the input source code immediately; there are many ways to do this. your best bet is to look up how the interpreter works for each language you are interested in - basic, python, lisp, etc.
Steven A. Lowe
+4  A: 

A program is a description of work you want done.

A compiler converts a high-level description into a simpler description.

An interpreter reads a description of what to do and does the work.

  • Some interpreters (e.g. Unix shells) read the description one small piece at a time and act on each piece as they see it; some (e.g. Perl, Python) read the entire description, internally convert it to a simpler form and then act on that.
  • Some interpreters (e.g. Java's JVM, or a Pentium 4 chip) only understand a very simple description language that is too tedious for humans to work with directly, so humans use compilers to convert their high-level descriptions to this language.

Compilers never do the work. Interpreters always do the work.

j_random_hacker
It's interesting that you mention a processor as an interpreter. That does give me a new point of view into this.
Daniel Magliola
Thanks, actually plinth was the first to describe it that way.
j_random_hacker
A: 

Given your list of steps:

  • Lexer
  • Parser (which builds the syntax tree)
  • Generate Intermediate code (like 3 address code)
  • Do all these crazy things to optimize if you want :-)
  • Generate "assembly" or "native code" from the 3 address code.

A very simple interpreter (like early BASICs or TCL) would only perform steps one and two one line at a time. And then throw away most of the results while proceeding to the next line to be executed. The other 3 steps would never be performed at all.

Darron
+1  A: 

In regard to this part of your question, which the other answers haven't really addressed:

Also, at which point does the concept of "virtual machine" cut in? What do you use a virtual machine for in a language?

Virtual machines like the JVM or the CLR are a layer of abstraction that allow you to reuse JIT compiler optimization, garbage collection and other implementation details for completely different languages that are compiled to run on the VM.

They also help you make the language specification more independent from the actual hardware. For example, while C code is theoretically portable, you constantly have to worry about things like endianness, type size and variable alignment if you actually want to produce portable code. Whereas with Java, the JVM is very clearly specified in these regards, so the language designer and its users don't have to worry about them; it's the job of the JVM implementer to implement the specified behaviour on the actual hardware.

Michael Borgwardt
Thank you, that was very clear. Now, to make sure I understood, the VM is just a "target machine" to generate code for, which will pick up your "compiled" code and either compile it for the real machine or interpret it and run it, right? It's like splitting compilation in 2 independent steps?
Daniel Magliola
Yes, that's pretty much it.
Michael Borgwardt
A: 

Once a parse-tree is available, there are several strategies:

1) directly interpret the AST (Ruby, WebKit's original interpreter) 2) code transformation -> into byte codes or machine code

To achieve Edit-and-Continue, the program counter or instruction pointer has to be recalculated and moved. This requires cooperation from the IDE, because code may have been inserted before or after the little yellow arrow.

One way this could be done is to embed the position of the program counter in the parse tree. For instance, there might be a special statement called "break". The program counter only needs to be positioned after the "break" instruction to continue running.

In addition, you have to decide what you want to do about the current stack frame (and variables on the stack). Perhaps popping the current stack, and copying the variables over, or keeping the stack, but patch in a GOTO and RETURN to the current code.

Chui Tey