tags:

views:

104

answers:

3

I'm going to try to write a compiler for a dynamic language. Preferably to some existing virtual machine --- I don't (yet) want to deal with garbage collection and the myriad other concerns a good VM handles for you. What VMs do you suggest?

I'm on Linux, so I don't know if .NET (via Mono) is that good an idea. I've heard that Parrot is good for dynamic languages, but I haven't heard of any language use that. Should I invent my own? Does LLVM even count as a VM I should compile against, or is it as hard as straight x86?

Also, what pros and cons are there to stack-based vs register-based VMs?

Performance and tool support would be important. I'll be writing the compiler in Haskell, so a good interface with that is a plus.

+7  A: 

JVM (Java) and the CLR (.NET) seem to be the two most common targets for this, as they both handle most of these issues for you. Both provide fairly straightforward instruction sets to work with.

The CLR has one advantage - it was really designed with the goal of supporting multiple languages from the start, and it's (IMO) slightly easier to work with, especially if you're not going to be writing a language that fits into the original "mold" of the initial languages targeting that runtime. Mono works well enough that I wouldn't shy away from a CLR target because of it.

Reed Copsey
But do dynamic languages, a la Python, have to use evil workarounds to work? .NET was, after all, designed for C#...Also, how is tool support for non-C# languages? I mean, I'm sure there will be byte-code level debug, but how easy would it be to write higher-level tools?
pavpanchekha
@pavpanchekha: The CLR wasn't really designed specifically for C# - it had other languages (such as VB.NET) in mind right from the start. With the DLR, too, it's even nicer - see IronPython, IronRuby, VB.NET, C#, and all of the languages here: http://en.wikipedia.org/wiki/Microsoft_.NET_Languages
Reed Copsey
That's the main reason I said the CLR has some advantages here - it was designed with being "language neutral" from day 1 - where the JVM can be done this way, it was designed for Java from the start...
Reed Copsey
+1  A: 

.NET has the Dynamic Language Runtime, as mentioned by Reed Copsey. But I don't even know the CLR, much less the DLR - I can't tell anything about either. The LLVM should be nicer than plain x86, but it's still low level. But I can't tell too much about it, either - just a few glances.

I looked into Parrot, though. The idea itself is pretty great, and the implementation looks sound. If I ever make a dynamic language, I'd pretty sure it will target parrot. The PIR (Parrot intermediate representation) is very high-level for a VM. You have syntactic sugar (arimethic operators, assigments, calling subroutines and returning from them is a piece of cake, ...), don't mess with the exact register numbers but just take as many as you want and assign any number to them, and even have named variables!

If I had to choose, I assume I'd prefer a register-based VM. Research indicates that these trade bytecode size for execution speed, which suits me fine. Plus, too complex stack operations tend to meld my brain when I try to comprehend them - register-based operations come more natural imho.

delnan
+3  A: 

LLVM gives you a much better programming model than straight x86 assembly. Yes, it's low-level. But you don't have to worry about register schedulign or fully optimizing your output. Also, while you're still writing your front-end, you can take advantage of its type system to catch mistakes you might make.

That said, you'll have to develop your own runtime layer to take care of the "dynamic" parts of your language. Just for that part alone, I might tend to stick with CLR.

Karmastan
you *still* need to worry about fully optimizing your output. And it's questionable whether register scheduling is of any use on x86. What performance benefits do you get from its 6 "general purpose" registers?
Cheery
LLVM's instruction set exposes an unbounded number of SSA registers. As the programmer, you'd don't have to worry about fitting all your variables into the number of registers a platform actually offers or wondering how best to spill them to the stack.
Karmastan
In LLVM, getting to the IR representation is only the first step done by the compiler front end. Most optimization is done on that IR in the back end. If you write a tool to output IR, you can take advantage of the existing back end optimizer so you don't have to, for instance, design your system to perform loop invariant code motion.
Karmastan