ansaurus

Question

Answer 1

+1 A:

Doesn't LLVM then optimise the IR depending on the specific architecture implemented in the back-end? The IR code is not directly translated 1:1 into the final binary. As far as I understand it, that's how it works. However, I have only started to play around with the back-end (I'm porting it over to a custom processor).

sybreon 2009-04-09 07:58:09

I'm not talking about the end size. I'm talking about the code needed to MAKE the IR.

Unknown 2009-04-09 07:59:53

Answer 2

+1 A:

LLVM does require some boilerplate code, but once you understand it, it is really quite simple. Try looking for a simple GCC front end, and you will realize how clean LLVM is. I would definitely recommend LLVM over C or ASM. ASM is not portable at all, and generating source code is usually a bad thing, because it makes compiling slow.

Zifre 2009-04-09 14:46:36

What about compiling to LLVM IR? Do you know if it is stable enough?

Unknown 2009-04-09 19:02:44

LLVM IR works, but it has many of the same problems as compiling to C. If you are using C++ for the compiler, using the libraries is a lot easier.

Zifre 2009-04-09 20:03:29

Answer 3

+5 A:

The problem lies with C++ and not LLVM.

Use a language designed for metaprogramming, like OCaml, and your compiler will be vastly smaller. For example, this OCaml Journal article describes an 87-line LLVM-based Brainfuck compiler, this mailing list post describes complete programming language implementation including parser that can compile the Fibonacci function (amongst other programs) and the whole compiler is under 100 lines of OCaml code using LLVM, and HLVM is a high-level virtual machine with multicore-capable garbage collection in under 2,000 lines of OCaml code using LLVM.

Jon Harrop 2009-05-12 18:54:33

Thanks for the suggestion Jon. Unfortunately programming in OCaml is still difficult for me to get the hang of since I am mostly a procedural programmer.

Unknown 2009-05-12 21:05:57

Even if you include the time taken to learn OCaml, it will still be faster to write a production-quality compiler in OCaml rather than using C++. I cannot recommend OCaml strongly enough for this purpose.

Jon Harrop 2009-05-20 10:24:24

Answer 4

+1 A:

Intermediate representations can be a bit verbose, compared with non-virtual assembler. I learned that looking at .NET IL, though I never went much further than looking. I'm not really familiar with LLVM, but I guess it's the same issue.

It kind of makes sense when you think about it, though. One big difference is that IRs have to deal with a lot of metadata. In assembler there is very little - the processor implicitly defines a lot, and conventions for things like function calls are left to the programmer/compiler to define. That's convenient, but it creates big portability and interop issues.

Intermediate representations such as .NET and LLVM care about making sure that separately compiled components can work together - even components written in different languages and compiled by different compiler front ends. That means metadata is needed to describe what is going on at a higher level than e.g. arbitrary pushes, pops and loads that might be parameter handling, but could be just about anything. The payoff is pretty big, but there's a price to pay.

There's other issues, too. The intermediate representation isn't really meant to be written by humans, but it is meant to be readable. Also, it's meant to be general enough to survive a number of versions without a complete incompatible from-scratch redesign.

Basically, in this context, explicit is almost always better than implicit, so verbosity is hard to avoid.

Steve314 2009-10-09 01:48:07

ansaurus

tags:

views:

answers:

Questions for compiling to LLVM

related questions