views:

69

answers:

4

Do you know good compiler designs when the output should be located into the process memory and executed immediately after compiling?

I've looked into few SCHEME compilers and read whatever I could about V8. There's interesting JIT techniques like inline caching I would like to try in my own compiler.

It's all right to answer with almost obvious things like exploiting the fact that you compile inside the same address space where the output-program gets executed in. I'm interested about design choices in storing emitting and linking programs.

+1  A: 

Partially relevant: Principles of Artificial Intelligence Programming in Common Lisp by Peter Norvig.

This book covers a lot of stuff, with a lot of CL example code. One of the later chapters discusses compiling code, in fact if I remember correctly he writes a compiler for Scheme, and he discusses various optimization techniques.

Of course when working in Lisps, much of the work of "compiling" is already done for you by the language. I don't remember what kind of "executable" code he created, maybe some kind of CL bytecode?

Carl Smotricz
Is that true? I thought the stuff done by backend is the real work a compiler must do. Parsing into intermediate language being the less hard problem. If not, then I'm quite happy, because I know much more about the latter.
Cheery
I'm not very qualified to answer that. I read Norvig's book, and he makes everything he does look easy. But that's only because he's a genius. Since you're directly dealing with the problem, you'll want to look for yourself.
Carl Smotricz
+1  A: 

I read of a book called "Lisp in Small Pieces" that was allegedly good, and discussed the implementation of Lisp. If memory serves, and what I read was accurate, that might be very useful to you.

David Thornley
A: 

It seems to me that most of the work of a compiler is done by the time you choose exactly where you should start outputting code. Whether it's going to generate a binary on disk or within the process space will be mostly invisible to most of the compiler components. Perhaps one obvious thing you can do is to use profiling information for the current process to direct your optimisation passes.

The other design choice is going to relate to making it fast! If it's JIT, then you don't want to spend a long time waiting, I should think. Presumably this can guide other design decisions you are forced to make. I think you're unlikely to regret making your compiler faster vs. other constraints.

So for me the salient point is that you really don't need to care too much about where the code is going to end up until you actually are at the point of putting it there. There seems no reason that the same compiler couldn't generate normal binaries on disk as well as outputting to somewhere inside the current process space (with due deference to getting headers and binary formats right).

Gian
There seems no reason? You can shortcut a lot when you compile into the memory. Items you require can be loaded into memory and bind directly against and you can entirely ignore executable file formats and linking into shared objects. Though in other hand, you can add a program that sets up the environ for program before loading it.
Cheery
Yes, that was exactly what that last parenthesized sentence was meant to capture - "with some extra work".
Gian
A: 

Talking about generating code for a generic compiler is pretty much the same regardless of the language, once you get down to it. The complexities of the different language involve efficiently handling the semantics of the environments in the language (i.e. Scheme has things like closures and continuations, whereas something like BASIC does not).

But once you've decided how things like that are represented (and some affect efficiency in terms of memory layout, accessibility, etc.), the code generation is straightforward.

The distinctions that you're encountering are the differences between, say, a Scheme compiler that compiles to C and then hands that off to a C compiler (who may compile it to assembly and hand it off to an assembler), vs generating machine instructions directly in RAM by your compiler.

The different phases give you an opportunity to add optimizations, and offer a separation of concerns. Generating C code can be easier than generating assembly or, especially, machine code, because the C compiler can do some heavy lifting for you (such as architecture portability).

But several systems can use C to compile code that in immediately loaded via the dynamic linking process and executed.

Compiling to an intermediate language (ala JVM, CLR) can simplify things as well, and then you JIT that. Compiling JVM bytecodes is not particularly difficult, as it's a simple stack machine. OPTIMIZING that code is a different problem, but converting in to machine code is pretty straightforward. CLR is different because it captures more of the semantics of the code being compiled. CLR is more like an intermediate phase of compilation, in contrast to the JVM which is actual code designed to be executed as is.

In the end, it's all arrays of data in RAM somewhere, whether that data is machine code or VM code is a matter of detail. Whether that array is in RAM proper or mapped in by the VM from a file is another detail. With virtual memory, you system probably doesn't care one way or another.

So it boils down to focusing on the code generation proper. Once you're comfortable with that, directing it to other compilers, to RAM, or to files is a minor step.

Will Hartung