views:

870

answers:

4

I've been playing around with LLVM hoping to learn how to use it.

However, my mind is boggled by the level of complexity of the interface.

Take for example their Fibonacci function

int fib(int x) {
    if(x<=2) 
        return 1;
    return fib(x-1) + fib(x-2);
   }

To get this to output LLVM IR, it takes 61 lines of code!!!

They also include BrainFuck which is known for having the smallest compiler (200 bytes). Unfortunately, with LLVM, it is over 600 lines (18 kb).

Is this the norm for compiler backends? So far it seems like it would be far easier to do an assembly or C backend.

+1  A: 

Doesn't LLVM then optimise the IR depending on the specific architecture implemented in the back-end? The IR code is not directly translated 1:1 into the final binary. As far as I understand it, that's how it works. However, I have only started to play around with the back-end (I'm porting it over to a custom processor).

sybreon
I'm not talking about the end size. I'm talking about the code needed to MAKE the IR.
Unknown
+1  A: 

LLVM does require some boilerplate code, but once you understand it, it is really quite simple. Try looking for a simple GCC front end, and you will realize how clean LLVM is. I would definitely recommend LLVM over C or ASM. ASM is not portable at all, and generating source code is usually a bad thing, because it makes compiling slow.

Zifre
What about compiling to LLVM IR? Do you know if it is stable enough?
Unknown
LLVM IR works, but it has many of the same problems as compiling to C. If you are using C++ for the compiler, using the libraries is a lot easier.
Zifre
+5  A: 

The problem lies with C++ and not LLVM.

Use a language designed for metaprogramming, like OCaml, and your compiler will be vastly smaller. For example, this OCaml Journal article describes an 87-line LLVM-based Brainfuck compiler, this mailing list post describes complete programming language implementation including parser that can compile the Fibonacci function (amongst other programs) and the whole compiler is under 100 lines of OCaml code using LLVM, and HLVM is a high-level virtual machine with multicore-capable garbage collection in under 2,000 lines of OCaml code using LLVM.

Jon Harrop
Thanks for the suggestion Jon. Unfortunately programming in OCaml is still difficult for me to get the hang of since I am mostly a procedural programmer.
Unknown
Even if you include the time taken to learn OCaml, it will still be faster to write a production-quality compiler in OCaml rather than using C++. I cannot recommend OCaml strongly enough for this purpose.
Jon Harrop
+1  A: 

Intermediate representations can be a bit verbose, compared with non-virtual assembler. I learned that looking at .NET IL, though I never went much further than looking. I'm not really familiar with LLVM, but I guess it's the same issue.

It kind of makes sense when you think about it, though. One big difference is that IRs have to deal with a lot of metadata. In assembler there is very little - the processor implicitly defines a lot, and conventions for things like function calls are left to the programmer/compiler to define. That's convenient, but it creates big portability and interop issues.

Intermediate representations such as .NET and LLVM care about making sure that separately compiled components can work together - even components written in different languages and compiled by different compiler front ends. That means metadata is needed to describe what is going on at a higher level than e.g. arbitrary pushes, pops and loads that might be parameter handling, but could be just about anything. The payoff is pretty big, but there's a price to pay.

There's other issues, too. The intermediate representation isn't really meant to be written by humans, but it is meant to be readable. Also, it's meant to be general enough to survive a number of versions without a complete incompatible from-scratch redesign.

Basically, in this context, explicit is almost always better than implicit, so verbosity is hard to avoid.

Steve314