views:

231

answers:

6

I want to know how to design a compiler that compiles very, very quickly.

First, let me head off some obvious misunderstandings of my question:

  1. I am not talking about the speed of the code produced by the compiler. There are already many resources available for learning how to optimize generated code. What I'm having trouble finding is information on making the compiler fast.

  2. I'm also not interested in a discussion of why C++ compilers are generally slower than Java compilers (for example). I'm interested in what techniques can be used to be speed up the compiler for any given language.

  3. I also don't want to hear about distributed compilation systems like Microsoft's Incredibuild or Unix's distcc. Those systems don't give you faster compilers, they just give you more compilers. Which is certainly useful, but that's not the question I'm asking. I want to know how to design a fast compiler for a single CPU.

  4. Nor is ccache the answer I'm looking for. That's a system that allows you to avoid using the compiler at all, but it doesn't make the compiler faster. Again, that's useful; again, that's not the question I'm asking.

I hope my question is now crystal clear. But perhaps some history will make it even clearer.

C compilers used to be really slow. Then, in 1986, THINK Technologies introduced Lightspeed C for Macintosh, and it compiled programs almost instantaneously. Lightspeed C was so much faster than all the other C compilers that there was hardly any comparison. (Perhaps Lightspeed C wasn't the first of the new generation of lightning-fast compilers, but it was the first in my experience. Turbo Pascal came earlier [1983] but I had no experience with it, so I don't know how it compared, speed-wise.)

Since then, many fast compilers have been available. It seems that there was some kind of quantum leap in compiler technology in the 1980's, and that in particular is what I'm trying to understand. What was the breakthrough?

The answer may be this simple: with IDE's like Lightspeed and Turbo, the integrated editor already has the source code in RAM. If the compiler operates off that data, it eliminates disk I/O, which is the slowest part of any compiler. That's probably a very important contributor to the speed improvement, if the source code size is small relative to the memory size. (In those days, RAM sizes were much smaller, but then so were typical program sizes.)

Is that it? Or were there other important innovations involved? And have there been important improvements in compiler speed since then?

A: 

C++ compilers are slower that Java compilers, mainly because they (usually) produce optimized native code, while Java compilers generate not-that-much optimized bytecodes, and leave the final optimization & native code generation to the JIT compiler (performed at run-time). Since serious optimizations require knowledge to the native code, there a limit to how much the bytecode compiler can do.

Now, I can't comment of Lightspeed (since I know nothing about it), but in the case of Lattice & Microsoft C (slow) vs Borland TurboC (fast), Borland kept all the files in memory and compiled them there (If your program crashed, it might take down the IDE, losing you unsaved source code!). Micrsoft's IDE would always save the files to disk, and then launch a separate program to read the disk & compile it.

Using precompiler header files also helped speed up c/C++ compiles.

Another thing which help speed up compiles is a language designed to allow a single-pass compilation. For example, Pascal requires every function used to be defined (not merely declared, as in C++) before it is used (which is why the main function must be last in the source file)

James Curran
Please read the question completely. Most of your response falls into the categories I _explicitly_ said were beside the point. The point about files-in-memory is just a repeat of part of my question.
Thom Boyer
Note that C++ compilers are also slower because parsing C++ is harder. I don't know why C was designed to be hard to parse, but it was, probably inadvertantly. C++ couldn't get away from that, while Java and C# could.
David Thornley
+3  A: 
  • Simple syntax that can be parsed in a single pass.
  • Simple target code. If you don't target machine code directly you can get away with many things.
  • Not compiling at all. If you don't need fast execution or design mostly for one off scripts, you don't need to waste time analyzing the code.
  • Don't, I repeat, do not try to out-smart your OS disk/cache management. Mmap the whole damn file and read it as if you read it from RAM. If you don't have a virtual memory, fast compilation is the least of your worries.
  • Avoid creating XML DOM like bloated data structures for AST. You don't need to animate your operator precedences. Keep pointers to the mmaped data instead of copying stuff around.
  • Profile your code if you want it fast. Always.

Addition:

  • Learn different ways to parse. If you are not extremely confident of your parser writing skills, use proven parser/lexer generator tools like antlr, lemon etc.
artificialidiot
Please read the question completely. Most of your response falls into the categories I _explicitly_ said were beside the point.
Thom Boyer
There is no special ingredient that makes a compiler or any other piece of software run faster an order of magnitude. Clean code, profiling, not doing stupid stuff is all there is to it.
artificialidiot
Making the right top-level design choices -- picking the right data structures and algorithms, for example -- can easily make your program run an order of magnitude faster than another. Cleaning up your code and profiling usually leads to small improvements, relatively speaking. I believe that the compiler speedup I witnessed in the '80s was caused by different design choices, not just by speeding up the bits and pieces. It's those design choices that I'm trying to understand.
Thom Boyer
By the way, your point about using mmap is a very good one. Speeding up my file I/O code will make very little difference compared to the results of that design decision.
Thom Boyer
If you allow me to specualte, the speedup you mention has more to do with increased availability of RAM so the compiler can avoid writing intermediate large data structures to the magnetic memory. A good data structure that is designed for slow sequential access will most often perform very bad for faster random access. Most old languages are either designed to be parsed very very easily or compiled on mainframes in batch jobs.
artificialidiot
A: 

Today you would certainly make your compiler use all the cores available to it. I'm not writing about distributed compilation but parallel compilation -- design your compiler from the ground up to use multiple cores. One obvious approach would be to pipeline the various stages of a compiler. Re-writing an AST could certainly be parallelised too

And please, spare your typing and don't tell us that this approach is excluded by your 'rules'. Your rules would probably prevent the use of a floating-point unit for optimising floating-point arithmetic, or forbid the use of any processor with a higher clock speed than 1GHz.

If you want to write fast programs for today's computers, write them for today's CPUs, notr yesterday's. Today's computers use multicore CPUs.

High Performance Mark
Yes, a modern compiler should use the cores available to it, including any GPU's. I like your idea of pipelining the compiler stages. These are good ideas, and I thank you for sharing them.Is that "excluded by my 'rules'", as you put it? Yes! It's a good answer to a question I didn't ask.The heart of my question is: "What happened in the 1980's to make compilers so much faster." I should've made that the question title.Modern compilers use the ideas you've all mentioned; they're also using techniques invented in the 1980's -- and *that* is the part of the picture I'm missing.
Thom Boyer
+2  A: 

One issue is what you emit for generated code. You can put pretty much as much compiler time into optimization as you like. Straightforward generation, maybe even looking sort of dumb, will save you time. Of course, back when I used Turbo Pascal and Lightspeed C, the neat part was getting an executable conveniently, not how well optimized it was. Desktop compiler technology, at the time, was seriously behind mainframe compiler technology.

Another thing about Turbo Pascal and Lightspeed C was the integration. Particularly in the days before multitasking home computers, this was great. Unlike the first C compiler I owned (for CP/M), I didn't have to make changes in an editor, close that, do the compile, do the link, and then execute. This may have been part of what you saw: fast execution of components, without the need to type in complicated commands. I can duplicate this now by running multiple terminals on a Gnome desktop: one for vim, one to run gcc, and one to execute in.

Aside from that, reducing I/O is good. Fast lexical analysis is essentially a solved problem nowadays, but not necessarily back then. I'm not sure about parsing, last having delved into that twenty years ago, so somebody else can guide you on fast parsing and such.

David Thornley
Another factor contributing to the raw speed of Turbo Pascal (the original) was that it was hand coded in assembler.
NealB
+1  A: 

Common wisdom has been that hand coded top down recursive descent based parsers are faster than rule based LALR(k) parsers such as built by yacc -- assuming they are coded well. Although LALR(1) can unambiguously parse a larger class of languages. They can also give better error messages in some cases.

Its not clear that parsing is the performance bottleneck, compared to all the other issues people have been discussing. That is, doing a poor job on file IO or AST traversal can hurt alot - probably much more than you'd pay for using a slightly less efficient parser.

Still the really fast compilers I'm familiar with used hand crafted recursive descent parsers.

Alex Blakemore
A: 

I think one of the big changes in Turbo Pascal was that on many previous compilers/assemblers/linker, the source code and object code would both be on disk, as would parts of the compiler/assembler/linker. Trying to read and write multiple files at a time on a single drive would often be more than twice as slow as reading or writing one file. Turbo Pascal kept the entire development system in RAM, and in many cases the source or object code would be in RAM as well.

Late in the life of the Commodore 64, there was an assembler called Fast Assembler, which patched the basic interpreter to add assembly-language opcodes and a few new directives. The ORG directive would set an target code location and a "pass" flag. If the "pass" flag was clear, each opcode would bump the target code location by the instruction size but wouldn't generate any code and wouldn't complain about out-of-range branches. If the pass flag was set, code would be generated. To assemble a program, one would surround it with a for/next loop to go through three times, with the "pass" flag set on the last time. Because everything was kept in RAM, the edit-assemble-test cycle was immensely fast compared with any earlier assemblers.

supercat