tags:

views:

547

answers:

17

I was thinking more about the programming language i am designing. and i was wondering, what are ways i could minimize its compile time?

+1  A: 

Here's a shot..

Use incremental compilation if your toolchain supports it. (make, visual studio, etc).

For example, in GCC/make, if you have many files to compile, but only make changes in one file, then only that one file is compiled.

krebstar
Unless it's a header file. It's often worth putting a bit of effort into factoring .h files so that they don't change too often. One legacy project we have unnecessarily stores data in a header file that's used everywhere. There're no incremental compilation gains there.
Edmund
Uh, i think header files aren't compiled.. Like, if you try to compile a header file in visual c++, you get a message saying "No compile tool associated with this extension." Anyway, that's right, but only for C/C++, maybe..
krebstar
+3  A: 

what are ways i could minimize its compile time?

  • No compilation (interpreted language)
  • Delayed (just in time) compilation
  • Incremental compilation
  • Precompiled header files
ChrisW
We did a test here with VS6 a few years ago, and our large project with a 15minute compile time actually took about 6 seconds *longer* to compile with precompiled header files turned on.
T.E.D.
Now that I look at this answer, I have problems with your other bullets too. The main issue is they don't actually reduce compile time, but instead move it to another place. That *might* be what he needs, but its certianly not what he asked for.
T.E.D.
Using VS6 IMO you can't afford to use 'auto' for the precompiled header: instead you have to say exactly which *.cpp file generates them (i.e. e.g. "stdafx.cpp") and how many of the headers are precompiled.
ChrisW
Hmmm. We just turned it off. :-) I suppose that might bear looking into. But seeing as we are probably the last company in the world still using VS6, and we are migrating away this year, it's probably an acedemic matter.
T.E.D.
A: 

it depends on what language/platform you're programming for. for .NET development, minimise the number of projects that you have in your solution.

cruizer
A: 

In the old days you could get dramatic speedups by setting up a RAM drive and compiling there. Don't know if this still holds true, though.

pantulis
I mentioned this in my answer too. I'm also curious if anyone still does this.
T.E.D.
A: 

In C++ you could use distributed compilation with tools like Incredibuild

madgnome
+1  A: 

Eiffel had an idea of different states of frozen, and recompiling didn't necessarily mean that the whole class was recompiled.

How much can you break up the compliable modules, and how much do you care to keep track of them?

Greg Domjan
+1  A: 

In most languages (pretty well everything other than C++), compiling individual compilation units is quite fast.

Binding/linking is often what's slow - the linker has to reference the whole program rather than just a single unit.

C++ suffers as - unless you use the pImpl idiom - it requires the implementation details of every object and all inline functions to compile client code.

Java (source to bytecode) suffers because the grammar doesn't differentiate objects and classes - you have to load the Foo class to see if Foo.Bar.Baz is the Baz field of object referenced by the Bar static field of the Foo class, or a static field of the Foo.Bar class. You can make the change in the source of the Foo class between the two, and not change the source of the client code, but still have to recompile the client code, as the bytecode differentiates between the two forms even though the syntax doesn't. AFAIK Python bytecode doesn't differentiate between the two - modules are true members of their parents.

C++ and C suffer if you include more headers than are required, as the preprocessor has to process each header many times, and the compiler compile them. Minimizing header size and complexity helps, suggesting better modularity would improve compilation time. It's not always possible to cache header compilation, as what definitions are present when the header is preprocessed can alter its semantics, and even syntax.

C suffers if you use the preprocessor a lot, but the actual compilation is fast; much of C code uses typedef struct _X* X_ptr to hide implementation better than C++ does - a C header can easily consist of typedefs and function declarations, giving better encapsulation.

So I'd suggest making your language hide implementation details from client code, and if you are an OO language with both instance members and namespaces, make the syntax for accessing the two unambiguous. Allow true modules, so client code only has to be aware of the interface rather than implementation details. Don't allow preprocessor macros or other variation mechanism to alter the semantics of referenced modules.

Pete Kirkham
A: 

A simple one: make sure the compiler can natively take advantage of multi-core CPUs.

Mark Pattison
+5  A: 

Try these rules:

  1. Design your compiler to work in several, independent steps. The goal is to be able to run each step in a different thread so you can utilize multi-core CPUs. It will also help to parallelize the whole compile process (i.e. compile more than one file at the same time)

  2. Try to allow to compile files independently. For example, create a "missing symbol pool" for the project. Missing symbols should not cause compile failures as such. If you find a missing symbol somewhere, remove it from the pool. When all files have been compiled, check that the pool is empty.

  3. Create a cache with important information. For example: File X uses symbols from file Y. This way, you can skip compiling file Z (which doesn't reference anything in Y) when Y changes. If you want to go one step further, put all symbols which are defined anywhere in a pool. If a file changes in such a way that symbols are added/removed, you will know immediately which files are affected (without even opening them).

  4. Compile in the background. Start a compiler process which checks the project directory for changes and compile them as soon as the user saves the file. This way, you will only have to compile a few files each time instead of everything. In the long run, you will compile much more but for the user, turnover times will be much shorter (= time user has to wait until she can run the compiled result after a change).

  5. Use a "Just in time" compiler (i.e. compile a file when it is used, for example in an import statement). Projects are then distributed in source form and compiled when run for the first time. Python does this. To make this perform, you can precompile the library during the installation of your compiler.

  6. Don't use header files. Keep all information in a single place and generate header files from the source if you have to.

Aaron Digulla
+1  A: 
  • Make the grammar simple and unambiguous, and therefore quick and easy to parse.
  • Place strong restrictions on file inclusion.
  • Allow compilation without full information whenever possible (eg. predeclaration in C and C++).
  • One-pass compilation, if possible.
Dan Olson
A: 
  • Make sure that everything can be compiled the fist time you try to compile it. E.g. ban forward references.
  • Use a context free grammar so that you can find the correct parse tree without a symbol table.
  • Make sure that the semantics can be deduced from the syntax so you can construct the correct AST directly rather than by mucking with a parse tree and symbol table.
BCS
A: 

How serious a compiler is this?

Unless the syntax is pretty convoluted, the parser should be able to run no more than 10-100 times slower than just indexing through the input file characters.

Similarly, code generation should be limited by output formatting.

You shouldn't be hitting any performance issues unless you're doing a big, serious compiler, capable of handling mega-line apps with lots of header files.

Then you need to worry about precompiled headers, optimization passes, and linking.

Mike Dunlavey
+1  A: 

I've implemented a compiler myself, and ended up having to look at this once people started batch feeding it hundreds of source files. I was quite suprised what I found out.

It turns out that the most important thing you can optimize is not your grammar. It's not your lexical analyzer or your parser either. Instead, the most important thing in terms of speed is the code that reads in your source files from disk. I/O's to disk are slow. Really slow. You can pretty much measure your compiler's speed by the number of disk I/Os it performs.

So it turns out that the absolute best thing you can do to speed up a compiler is to read the entire file into memory in one big I/O, do all your lexing, parsing, etc. from RAM, and then write out the result to disk in one big I/O.

I talked with one of the head guys maintaining Gnat (GCC's Ada compiler) about this, and he told me that he actually used to put everything he could onto RAM disks so that even his file I/O was really just RAM reads and writes.

T.E.D.
I was thinking using a named pipe instead of a file and other ways to keep it in ram. I couldnt think of a way to improve linking. Can i get you to email me myusername @gmail.com i have tons of questions for you.
acidzombie24
I wouldn't go through herculean efforts before you have a chance to actually measure things. Premature optimization and whatnot. Just be sure to make the front end to the lexer modular enough to replace with a buffer API later.
T.E.D.
A: 

One thing surprisingly missing in answers so far: make you you're doing a context free grammar, etc. Have a good hard look at languages designed by Wirth such as Pascal & Modula-2. You don't have to reimplement Pascal, but the grammar design is custom made for fast compiling. Then see if you can find any old articles about the tricks Anders pulled implementing Turbo Pascal. Hint: table driven.

dwc
A: 

I haven't seen much work done for minimizing the compile time. But some ideas do come to mind:

  1. Keep the grammar simple. Convoluted grammar will increase your compile time.
  2. Try making use of parallelism, either using multicore GPU or CPU.
  3. Benchmark a modern compiler and see what are the bottlenecks and what you can do in you compiler/language to avoid them.

Unless you are writing a highly specialized language, compile time is not really an issue..

Sridhar Iyer
A: 

Make a build system that doesn't suck!

There's a huge amount of programs out there with maybe 3 source files that take under a second to compile, but before you get that far you'd have to sit through an automake script that takes about 2 minutes checking things like the size of an int. And if you go to compile something else a minute later, it makes you sit through almost exactly the same set of tests.

So unless your compiler is doing awful things to the user like changing the size of its ints or changing basic function implementations between runs, just dump that info out to a file and let them get it in a second instead of 2 minutes.

Ant P.
A: 

Here are some performance tricks that we've learned by measuring compilation speed and what affects it:

  • Write a two-pass compiler: characters to IR, IR to code. (It's easier to write a three-pass compiler that goes characters -> AST -> IR -> code, but it's not as fast.)

  • As a corollary, don't have an optimizer; it's hard to write a fast optimizer.

  • Consider generating bytecode instead of native machine code. The virtual machine for Lua is a good model.

  • Try a linear-scan register allocator or the simple register allocator that Fraser and Hanson used in lcc.

  • In a simple compiler, lexical analysis is often the greatest performance bottleneck. If you are writing C or C++ code, use re2c. If you're using another language (which you will find much more pleasant), read the paper aboug re2c and apply the lessons learned.

  • Generate code using maximal munch, or possibly iburg.

  • Surprisingly, the GNU assembler is a bottleneck in many compilers. If you can generate binary directly, do so. Or check out the New Jersey Machine-Code Toolkit.

  • As noted above, design your language to avoid anything like #include. Either use no interface files or precompile your interface files. This tactic dramatically reduces the burdern on the lexer, which as I said is often the biggest bottleneck.

Norman Ramsey
IR means what? i am looking up this info now, isnt lexical analysis no longer a problem? i thought IO was the main problem (as in your analysis read to many files, not the actual complexity). My language i plan to be complex.
acidzombie24
IR means intermediate representation. In a simple compiler, lexing is still expensive; the loop touches every character of the input. You want *both* to keep total input size down *and* to make lexing efficient. A complex grammar is OK.
Norman Ramsey