views:

525

answers:

3

I've got a few languages I've been building as interpreters. When I'm ready to take "that next step", what options are best for non-native compiled formats... what are the pros and cons of each?

I've been looking at compiling to CLR or LLVM, and contemplated C-midcompile a few times, but I'm not completely certain.

A few features I'm hoping to be able to port are as follows:

  1. REPL - One of the languages I'm building supports block-level evaluation during runtime.
  2. Robust Macros - One of the languages I'm building requires the ability to filter through code seperately before tokenizing, and in the midstep between tokenizing and parsing.

Ok, not really "a few", just two. I like to think I can port any other features my languages support to "anything".

What are my best options, and their pros/cons?

+3  A: 

LLVM seems promising. The team claims better runtime performances on gcc with their backend compared to native. The ability to compile from the AST is really interesting (take a look at the tutorial). It can compile and optimize at runtime, which is a must for dynamic. It can also run as a pure interpreter.

I consider using LLVM in a project that involves creating a Tcl-like language. Tcl is heavily dynamic, so I don't know what this implies at this stage, but I'm confident that I'll get better performances than the current bytecode-based core.

fbonnet
+12  A: 

pro/cons:

  • CLR:

    • pro: CLR environment readily available; a lot of stuff to bind to
    • con: bound to CLR (;-); targetting some systems will be hard or impossible (embedded, mainframes, etc. CLR impl. might be less mature on non MS systems)
  • LLVM:

    • pro: independent from MS.
    • con: targetting some systems might involve porting LLVM (?); interfacing to DOT-net, Java etc. might be troublesome (possibly needs FFI)
  • C as target language:

    • pro: almost all targets possible; easy code generation
    • con: you will have to implement some VM stuff as runtime library (GC, dynload, dyn compilation etc.); some things are hard to do in C (continuations, backtracking, stack tracing, exceptions); some things are hard to do efficiently and portable in C (GC, dynamic types, stack layout dependence).
  • Java ByteCode as target:

    • pro: probably the biggest set of possible target platforms (even mobil phones and embedded stuff); a lot of existing tools around; easy interfacing to existing libraries
    • con: some things are hard to implement or hard to implement efficiently (dynamic types, continuations, backtracking)

From all the above, I think targeting Java ByteCode would probably be best for you.

EDIT: actually an answer to a comment, but 300chars are not enough.

JByteCode iffy - I agree (being a Smalltalker, JBytecode is too limiting for me).

VM-wise, I think there is a relatively wide range of performance you can get as JVM, starting at pure slow bytecode interpreters up to high end sophisticated JITting VMs (IBM). I guess, CLR VM's will catch up, as MS is stealing and integrating all innovation anyway sooner or later, and the techniques to speedup dynamic translation are published (read the Self papers, for example). LLVM will probably progress a bit slower, but who knows. With C, you will benefit from better compilers for free, but things like dynamic retranslation etc. are hard to implement with C as target. My own system uses a mixture of precompiled and dynamically compiled code (having all: a slow bytecode interpreter, JITter and precompiled static C-code in one memory space).

blabla999
Java ByteCode is something I've always been iffy about. Call it bad past experience.Do any of them have any perks regarding power of their internal VM (other than just library calls?)
C code generation looks easy until you've been doing it for 6 to 18 months. Then suddenly things become impossible.
Norman Ramsey
+12  A: 

Code generation is my business :-)

Comments on a few options:

  • CLR:

    • Pro: industrial support
    • Con: you have to buy into their type system pretty much completely; depending on what you want to do with types, this may not matter
    • Con: Only Windows platform is really prime-time quality
  • LLVM:

    • Pro: enthusiastic user community with charismatic leader
    • Pro: many interesting performance improvements
    • Con: somewhat complex interface
    • Con: history of holes in the engineering; as LLVM matures expect the holes in the engineering to be plugged by adding to the complexity of the interface
  • C--

    • Pro: target is an actual written language, not an API; you can easily inspect, debug, and edit your C-- code
    • Pro: design is reasonably mature and reasonably clean
    • Pro: supports accurate garbage collection
    • Pro: most users report it is very easy to use
    • Con: very small development team
    • Con: as of early 2009, supports only three hardware platforms (x86, PPC, ARM)
    • Con: does not ship with a garbage collector
    • Con: future of project is uncertain
  • C as target language

    • Pro: looks easy
    • Con: nearly impossible to get decent performance
    • Con: will drive you nuts in the long run; ask the long line of people who have tried to compile Haskell, ML, Modula-3, Scheme and more using this technique. At some point every one of these people gave up and built their own native code generator.

Summary: anything except C is a reasonable choice. For the best combination of flexibility, quality, and expected longevity, I'd probably recommend LLVM.

Full disclosure: I am affiliated with the C-- project.

Norman Ramsey