views:

4331

answers:

11

Compiling a C++ file takes a very long time when compared to C#, Java. It takes significantly longer to compile a C++ file than it would to run a normal size Python script. I'm current using VC++ but it's the same with any compiler. Why is this?

The two reasons I could think of were loading header files and running the preprocessor, but that doesn't seem like it should explain why it takes so long.

A: 

Some reasons are:

1) C++ grammar is more complex than C# or Java and takes more time to parse.

2) (More important) C++ compiler produces machine code and does all optimizations during compilation. C# and Java go just half way and leave these steps to JIT.

Nemanja Trifunovic
+7  A: 

C++ is compiled into machine code. So you have the pre-processor, the compiler, the optimizer, and finally the assembler, all of which have to run.

Java and C# are compiled into byte-code/IL, and the Java virtual machine/.NET Framework execute (or JIT compile into machine code) prior to execution.

Python is an interpreted language that is also compiled into byte-code.

I'm sure there are other reasons for this as well, but in general, not having to compile to native machine language saves time.

Alan
The cost added by pre-processing is trivial. The major "other reason" for a slowdown is that compilation is split into separate tasks (one per object file), so common headers get processed over and over again. That's O(N^2) worst-case, vs. most other languages O(N) parsing time.
Tom
Also, linking takes time, right?
ericmj
+5  A: 

Another reason is the use of the C pre-processor for locating declarations. Even with header guards, .h still have to be parsed over and over, every time they're included. Some compilers support pre-compiled headers that can help with this, but they are not always used.

See also: C++ Frequently Questioned Answers

Dave Ray
I think you should bold the comment on precompiled headers to point out this IMPORTANT part of your answer.
Kevin
If the whole header file (except possible comments and empty lines) is within the header guards, gcc is able to remember the file and skip it if the correct symbol is defined.
CesarB
@CesarB: It still has to process it in full once per compilation unit (.cpp file).
280Z28
+3  A: 

A compiled language is always going to require a bigger initial overhead than an interpreted language. In addition, perhaps you didn't structure your C++ code very well. For example:

#include "BigClass.h"

class SmallClass
{
   BigClass m_bigClass;
}

Compiles a lot slower than:

class BigClass;

class SmallClass
{
   BigClass* m_bigClass;
}
Andy Brice
Especially true if BigClass happens to include 5 more files that it uses, eventually including all the code in your program.
Tom Leys
This is perhaps one reason. But Pascal for example just takes a tenth of the compile time an equivalent C++ program takes. This is not because gcc:s optimization take longer but rather that Pascal is easier to parse and don't have to deal with a preprocessor. Also see Digital Mars D compiler.
Daniel W
+9  A: 
  • The Grammer of C++ is very complex
  • C++ is compiled into machine code. So, code generation, heavy optimization, and linking is all done while you are building.
  • C++ programs include headers and the compiler has to compile them all again, with every build process. Unless you use precompiled headers, the C++ compiler has to compile them all the way again with every build. In the event of templates, code will be left in headers, so there is an additional burden for the compiler to master each time it builds.
  • Through the use of templates, type-names generated by the compiler can become a couple of megabytes big. Processing those is quite time consuming.
Johannes Schaub - litb
+166  A: 

Several reasons:

  • Header files: Every single compilation unit requires hundreds or even thousands of headers to be 1: loaded, and 2: compiled. Every one of them typically has to be recompiled for every compilation unit, because the preprocessor ensure that the result of compiling a header might vary between every compilation unit. (A macro may be defined in one compilation unit which changes the content of the header).

    This is probably the main reason, as it requires huge amounts of code to be compiled for every compilation unit, and additionally, every header has to be compiled multiple times (once for every compilation unit that includes it)

  • Linking: Once compiled, all the object files have to be linked together. This is basically a monolithic process that can't very well be parallelized, and has to process your entire project.

  • Parsing: The syntax is extremely complicated to parse, depends heavily on context, and is very hard to disambiguate. This takes a lot of time

  • Templates: In C#, List<T> is the only type that is compiled, no matter how many instantiations of List you have in your program. In C++, vector<int> is a completely separate type from vector<float>, and each one will has to be compiled separately.

    Add to this that templates make up a full turing-complete "sub-language" that the compiler has to interpret, and this can become ridiculously complicated. Even relatively simple template metaprogramming code can define recursive templates that create dozens and dozens of template instantiations. Templates may also result in extremely complex types, with ridiculously long names, adding a lot of extra work to the linker. (It has to compare a lot of symbol names, and if these names can grow into many thousand characters, that can become fairly expensive).

    And of course, they exacerbate the problems with header files, because templates generally have to be defined in headers, which means far more code has to be parsed and compiled for every compilation unit. In plain C code, a header typically only contains forward declarations, but very little actual code. In C++, it is not uncommon for almost all the code to reside in header files.

  • Optimization: C++ allows some very dramatic optimizations. C# or Java don't allow classes to be completely eliminated (they have to be there for reflection purposes), but even a simple C++ template metaprogrogram can easily generate dozens or hundreds of classes, all of which are inlined and eliminated again in the optimization phase.

    Moreover, a C++ program must be fully optimized by the compiler. A C# program can rely on the JIT compiler to perform additional optimizations at load-time, C++ doesn't get any such "second chances". What the compiler generates is as optimized as it's going to get.

  • Machine code: C++ is compiled to machine code which may be somewhat more complicated than the bytecode Java or .NET use (especially in the case of x86).
    (This is mentioned out of completeness only because it was mentioned in comments and such. In practice, this step is unlikely to take more than a tiny fraction of the total compilation time.)

Most of these factors are shared by C code, which actually compiles fairly efficiently. The parsing step is a lot more complicated in C++, and can take up significantly more time, but the main offender is probably templates. They're useful, and make C++ a far more powerful language, but they also take their toll in terms of compilation speed.

jalf
couldn't have said it better...
Nils Pipenbrinck
Regarding point 3: C compilation is noticably faster than C++. It's definitely the frontend that causes the slowdown, and not the code generation.
Tom
Agreed, like I said, this is a very small factor. I only mentioned it because I saw it mentioned in some of the other responses, and by mentioning it here out of completeness, I could at least point out that it wasn't a big deal. :)
jalf
Moved point 3 to the bottom and rephrased it a bit.
jalf
Regarding templates: not only vector<int> must be compiled separatedly from vector<double>, but vector<int> is recompiled in each compilation unit that uses it. Redundant definitions are eliminated by the linker.
David Rodríguez - dribeas
dribeas: True, but that's not specific for templates. Inline functions or anything else defined in headers will be recompiled everywhere it's included. But yeah, that's especially painful with templates. :)
jalf
Regarding point 1: can't compiled header files be cached, perhaps once per macro configuration?
configurator
@configurator: Yes, they can be cached. Visual Studio does this, but I don't know the details. I think gcc doesn't do any caching by default, but it seems to be possible.
Thomas
@configurator: Visual Studio and gcc both allow for precompiled headers, which can bring some serious speed-ups to the compilation.
small_duck
Thomas: Got a link for that? I wasn't aware of VS doing any form of caching of headers. It does seem like an obvious optimization though. (unless you meant precompiled headers. I was thinking something more general should be possible)
jalf
In our experience it is especially templates which are hard (slow) to compile - in our project up to the point precompiled headers no longer matter any more. The more we use templates, and the more we do advanced stuff with them (like multiple encapsulation levels, traits, policies, or even metaprogramming) the longer the compilation takes.
Suma
I think the first 2 reasons you listed are the major causes, and a unity build would simply get it solved.
lz_prgmr
+5  A: 

The slowdown is not necessarily the same with any compiler.

I haven't used Delphi or Kylix but back in the MS-DOS days, a Turbo Pascal program would compile almost instantaneously, while the equivalent Turbo C++ program would just crawl.

The two main differences were a very strong module system and a syntax that allowed single-pass compilation.

It's certainly possible that compilation speed just hasn't been a priority for C++ compiler developers, but there are also some inherent complications in the C/C++ syntax that make it more difficult to process. (I'm not an expert on C, but Walter Bright is, and after building various commercial C/C++ compilers, he created the D language. One of his changes was to enforce a context-free grammar to make the language easier to parse.)

Also, you'll notice that generally Makefiles are set up so that every file is compiled separately in C, so if 10 source files all use the same include file, that include file is processed 10 times.

+2  A: 

Parsing and code generation are actually rather fast. The real problem is opening and closing files. Remember, even with include guards, the compiler still have open the .H file, and read each line (and then ignore it).

A friend once (while bored at work), took his company's application and put everything -- all source and header files-- into one big file. Compile time dropped from 3 hours to 7 minutes.

James Curran
Well, file access sure has a hand in this but as jalf said, the main reason for this will be something else, namely the repeated parsing of many, many, many (nested!) header files that completely drops out in your case.
Konrad Rudolph
It is at that point that your friend needs to set up precompiled headers, break dependancies between different header files (try to avoid one header including another, instead forward declare) and get a faster HDD. That aside, a pretty amazing metric.
Tom Leys
If the whole header file (except possible comments and empty lines) is within the header guards, gcc is able to remember the file and skip it if the correct symbol is defined.
CesarB
Parsing is a big deal. For N pairs of similarly-sized source/header files with interdependencies, there are O(N^2) passes through header files. Putting all text into a single file is cutting down that duplicate parsing.
Tom
A: 

The trade off you are getting is that the program runs a wee bit faster. That may be a cold comfort to you during development, but it could matter a great deal once development is complete, and the program is just being run by users.

T.E.D.
A: 

The biggest issues are:

1) The infinite header reparsing. Already mentioned.

2) The fact that the toolchain is often separated into multiple binaries (make, preprocessor, compiler, assembler, archiver, impdef, linker, and dlltool in extreme cases) that all have to reinitialize all the time for each (preprocessor, compiler, assembler) or every couple of files (archiver, linker, and dlltool).

See also this discussion on comp.compilers: http://compilers.iecc.com/comparch/article/03-11-078 specially this one:

http://compilers.iecc.com/comparch/article/02-07-128

Note that John, the moderator of comp.compilers seems to agree, and that this means it should be possible to achieve similar speeds for C too, if one integrates the toolchain fully and implements precompiled headers. Many commercial C compilers do this to some degree.

Marco van de Voort
A: 

Most answers are being a bit unclear in mentioning that C# will always run slower due to the cost of performing actions that in C++ are performed only once at compile time, this performance cost is also impacted due runtime dependencies (more things to load to be able to run), not to mention that C# programs will always have higher memory footprint, all resulting in performance being more closely related to the capability of hardware available. The same is true to other languages that are interpreted or depend on a VM.

Panic