What strategies have you used to improve build times on large projects?

views:

1103

answers:

+14 Q:

What strategies have you used to improve build times on large projects?

I once worked on a C++ project that took about an hour and a half for a full rebuild. Small edit, build, test cycles took about 5 to 10 minutes. It was an unproductive nightmare.

What is the worst build times you ever had to handle?

What strategies have you used to improve build times on large projects?

Update:

How much do you think the language used is to blame for the problem? I think C++ is prone to massive dependencies on large projects, which often means even simple changes to the source code can result in a massive rebuild. Which language do you think copes with large project dependency issues best?

+32 A:

Forward declaration
pimpl idiom
Precompiled headers
Parallel compilation (e.g. MPCL add-in for Visual Studio).
Distributed compilation (e.g. Incredibuild for Visual Studio).
Incremental build
Split build in several "projects" so not compile all the code if not needed.

[Later Edit] 8. Buy faster machines.

Cătălin Pitiș 2009-07-02 09:35:22

+1 Nice one catalin

Warrior 2009-07-02 09:45:49

Great list. I would also add "Buy faster machines"

demoncodemonkey 2009-07-02 10:32:47

Done :). Good suggestion.

Cătălin Pitiș 2009-07-02 14:27:08

I will second Incredibuld. For our project the build time went from 2 hours to about 10 minutes with with 6-7 developer machines providing 1 CPU each.

omerkudat 2009-07-02 20:58:19

I wouldn't say faster machines as much as faster drives - on one of my extremely large C++ projects, having RAID speeded up by a very large amount. Unrelatedly, if one is doing incremental builds on a large Visual C++ project that's not split into DLL's, the link stage will actually be faster if you turn off Incremental Linking.

Not Sure 2009-07-02 21:09:48

@Not Sure: I've found tha VC8+ is much less disk-bound than VC6 was. Building in parallel seems to be the "biggest bang for the buck" now.

peterchen 2010-08-05 19:55:01

+11 A:

My strategy is pretty simple - I don't do large projects. The whole thrust of modern computing is away from the giant and monolithic and towards the small and componentised. So when I work on projects, I break things up into libraries and other components that can be built and tested independantly, and which have minimal dependancies on each other. A "full build" in this kind of environment never actually takes place, so there is no problem.

anon 2009-07-02 09:35:26

How do you write an operating system? How do you write a compiler? Not everything is simple.

Edouard A. 2009-07-02 09:44:15

Well, I have writtebn a couple of compilerws. They are modular too.

anon 2009-07-02 10:10:22

An operating system kernel is mostly drivers, which are built separately. And the kernel is just a small part of an operating system, which in unix is hundreds of userland applications. Toolchains are made of an assembler, compiler, linker, various tools like nm and objdump, etc.

KeyserSoze 2009-07-02 16:35:20

Nice to hear that the Unix philosophy of the 80's starts getting called "the thrust of modern computing" these days.

slacker 2010-08-05 20:27:10

@slacker Well, these things take time to catch on. I was a UNIX programmer in the 80s, BTW.

anon 2010-08-05 20:45:38

+2 A:

Fiddle with the compiler optimisation flags,
use option -j4 for gmake for parallel compilation (multicore or single core)
if you are using clearmake , use winking
we can take out the debug flags..in extreme cases.
Use some powerful servers.

Warrior 2009-07-02 09:43:05

+1 A:

Powerful compilation machines and parallel compilers. We also make sure the full build is needed as little as possible. We don't alter the code to make it compile faster.

Efficiency and correctness is more important than compilation speed.

Edouard A. 2009-07-02 09:45:48

+1 A:

In Visual Studio, you can set number of project to compile at a time. Its default value is 2, increasing that would reduce some time.

This will help if you don't want to mess with the code.

vrrathod 2009-07-02 10:02:54

+1 A:

This is the list of things we did for a development under Linux :

As Warrior noted, use parallel builds (make -jN)
We use distributed builds (currently icecream which is very easy to setup), with this we can have tens or processors at a given time. This also has the advantage of giving the builds to the most powerful and less loaded machines.
We use ccache so that when you do a make clean, you don't have to really recompile your sources that didn't change, it's copied from a cache.
Note also that debug builds are usually faster to compile since the compiler doesn't have to make optimisations.

Etienne PIERRE 2009-07-02 10:09:36

caching after a clean? I only ever run clean when I want to check the build system or when I really do want to rebuild from scratch.

BCS 2009-07-02 16:24:07

Sometimes you do a clean because the dependencies changed (for example you renamed a header and the dependencies store in the computed dependencies files gets lost).Another useful use of ccache is when the only changes made to file consists of documenting the code (adding doxygen documents to the functions), this doesn't change the preprocessed file and thus doesn't need recompilation.

Etienne PIERRE 2009-07-03 09:16:35

+3 A:

Multi core compilation. Very fast with 8 cores compiling on the I7.
Incremental linking
External constants
Removed inline methods on C++ classes.

The last two gave us a reduced linking time from around 12 minutes to 1-2 minutes. Note that this is only needed if things have a huge visibility, i.e. seen "everywhere" and if there are many different constants and classes.

Cheers

Magnus Skog 2009-07-02 10:18:53

+1 A:

We tried creating proxy classes once.

These are really a simplified version of a class that only includes the public interface, reducing the number of internal dependencies that need to be exposed in the header file. However, they came with a heavy price of spreading each class over several files that all needed to be updated when changes to the class interface were made.

rikh 2009-07-02 10:22:05

+5 A:

One trick that sometimes helps is to include everything into one .cpp file. Since includes are processed once per file, this can save you a lot of time. (The downside to this is that it makes it impossible for the compiler to parallelize compilation)

You should be able to specify that multiple .cpp files should be compiled in parallel (-j with make on linux, /MP on MSVC - MSVC also has an option to compile multiple projects in parallel. These are separate options, and there's no reason why you shouldn't use both)

In the same vein, distributed builds (Incredibuild, for example), may help take the load off a single system.

SSD disks are supposed to be a big win, although I haven't tested this myself (but a C++ build touches a huge number of files, which can quickly become a bottleneck).

Precompiled headers can help too, when used with care. (They can also hurt you, if they have to be recompiled too often).

And finally, trying to minimize dependencies in the code itself is important. Use the pImpl idiom, use forward declarations, keep the code as modular as possible. In some cases, use of templates may help you decouple classes and minimize dependencies. (In other cases, templates can slow down compilation significantly, of course)

But yes, you're right, this is very much a language thing. I don't know of another language which suffers from the problem to this extent. Most languages have a module system that allows them to eliminate header files, which area huge factor. C has header files, but is such a simple language that compile times are still manageable. C++ gets the worst of both worlds. A big complex language, and a terrible primitive build mechanism that requires a huge amount of code to be parsed again and again.

jalf 2009-07-02 10:27:11

I agree. I had a C project of similar size to C++ project I mentioned in the question. Full rebuild was only 20 seconds, but I had put some effort into keeping dependencies low due to the bad experience I had had before.

rikh 2009-07-02 10:36:07

+3 A:

Unity Builds

Incredibuild

Pointer to implementation

forward declarations

compiling "finished" sections of the proejct into dll's

Stowelly 2009-07-02 11:05:32

+3 A:

IncrediBuild

Yuval A 2009-07-02 16:22:11

+1 A:

In general large C++ projects that I've worked on that had slow build times were pretty messy, with lots of interdependencies scattered through the code (the same include files used in most cpps, fat interfaces instead of slim ones). In those cases, the slow build time was just a symptom of the larger problem, and a minor symptom at that. Refactoring to make clearer interfaces and break code out into libraries improved the architecture, as well as the build time. When you make a library, it forces you to think about what is an interface and what isn't, which will actually (in my experience) end up improving the code base. If there's no technical reason to have to divide the code, some programmers through the course of maintenance will just throw anything into any header file.

KeyserSoze 2009-07-02 16:45:35

+2 A:

The best suggestion is to build makefiles that actually understand dependencies and do not automatically rebuild the world for a small change. But, if a full rebuild takes 90 minutes, and a small rebuild takes 5-10 minutes, odds are good that your build system already does that.

Can the build be done in parallel? Either with multiple cores, or with multiple servers?

Checkin pre-compiled bits for pieces that really are static and do not need to be rebuilt every time. 3rd party tools/libraries that are used, but not altered are a good candidate for this treatment.

Limit the build to a single 'stream' if applicable. The 'full product' might include things like a debug version, or both 32 and 64 bit versions, or may include help files or man pages that are derived/built every time. Removing components that are not necessary for development can dramatically reduce the build time.

Does the build also package the product? Is that really required for development and testing? Does the build incorporate some basic sanity tests that can be skipped?

Finally, you can re-factor the code base to be more modular and to have fewer dependencies. Large Scale C++ Software Design is an excellent reference for learning to decouple large software products into something that is easier to maintain and faster to build.

EDIT: Building on a local filesystem as opposed to a NFS mounted filesystem can also dramatically speed up build times.

semiuseless 2009-07-02 17:06:38

+1 for the book reference.

rikh 2009-07-03 14:48:21

+2 A:

ccache & distcc (for C/C++ projects) -

ccache caches compiled output, using the pre-processed file as the 'key' for finding the output. This is great because pre-processing is pretty quick, and quite often changes that force recompile don't actually change the source for many files. Also, it really speeds up a full re-compile. Also nice is the instance where you can have a shared cache among team members. This means that only the first guy to grab the latest code actually compiles anything.

distcc does distributed compilation across a network of machines. This is only good if you HAVE a network of machines to use for compilation. It goes well with ccache, and only moves the pre-processed source around, so the only thing you have to worry about on the compiler engine systems is that they have the right compiler (no need for headers or your entire source tree to be visible).

Michael Kohne 2009-07-02 17:27:24

+2 A:

This book Large-Scale C++ Software Design has very good advice I've used in past projects.

themis 2009-07-02 20:53:32

+1 This is one of the only books that really tackles the issue beyond suggesting the use of Pimpls

the_mandrill 2010-08-05 20:16:48

Full build is about 2 hours. I try to avoid making modification to the base classes and since my work is mainly on the implementation of these base classes I only need to build small components (couple of minutes).

cmdev 2009-07-09 12:50:24

Cătălin Pitiș covered a lot of good things. Other ones we do:

Have a tool that generates reduced Visual Studio .sln files for people working in a specific sub-area of a very large overall project
Cache DLLs and pdbs from when they are built on CI for distribution on developer machines
For CI, make sure that the link machine in particular has lots of memory and high-end drives
Store some expensive-to-regenerate files in source control, even though they could be created as part of the build
Replace Visual Studio's checking of what needs to be relinked by our own script tailored to our circumstances

Jonathan Moore 2010-08-05 19:45:08

It's a pet peeve of mine, so even though you already accepted an excellent answer, I'll chime in:

In C++, it's less the language as such, but the language-mandated build model that was great back in the seventies, and the header-heavy libraries.

The only thing that is wrong about Cătălin Pitiș' reply: "buy faster machines" should go first. It is the easyest way with the least impact.

My worst was about 80 minutes on an aging build machine running VC6 on W2K Professional. The same project (with tons of new code) now takes under 6 minutes on a machine with 4 hyperthreaded cores, 8G RAM Win 7 x64 and decent disks. (A similar machine, about 10..20% less processor power, with 4G RAM and Vista x86 takes twice as long)

Strangely, incremental builds are most of the time slower than full rebuuilds now.

peterchen 2010-08-05 20:02:14

Create some unit test projects to test individual libraries, so that if you need to edit low level classes that would cause a huge rebuild, you can use TDD to know your new code works before you rebuild the entire app. The John Lakos book as mentioned by Themis has some very practical advice for restructuring your libraries to make this possible.

the_mandrill 2010-08-05 20:19:59

Minimize your public API
Minimize inline functions in your API. (Unfortunately this also increases linker requirements).
Maximize forward declarations.
Reduce coupling between code. For instance pass in two integers to a function, for coordinates, instead of your custom Point class that has it's own header file.
Use Incredibuild. But it has some issues sometimes.
Do NOT put code that get exported from two different modules in the SAME header file.
Use the PImple idiom. Mentioned before, but bears repeating.
Use Pre-compiled headers.
Avoid C++/CLI (i.e. managed c++). Linker times are impacted too.
Avoid using a global header file that includes 'everything else' in your API.
Don't put a dependency on a lib file if your code doesn't really need it.
Know the difference between including files with quotes and angle brackets.

C Johnson 2010-08-05 20:53:27

ansaurus

tags:

views:

answers:

What strategies have you used to improve build times on large projects?

related questions