views:

203

answers:

5

I am very interested in some studies or empirical data that shows a comparison of compilation times between two c++ projects that are the same except one uses forward declarations where possible and the other uses none.

How drastically can forward declarations change compilation time as compared to full includes?

#include "myClass.h"

vs.

class myClass;

Are there any studies that examine this?

I realize that this is a vague question that greatly depends on the project. I don't expect a hard number for an answer. Rather, I'm hoping someone may be able to direct me to a study about this.

The project I'm specifically worried about has about 1200 files. Each cpp on average has 5 headers included. Each header has on average 5 headers included. This regresses about 4 levels deep. It would seem that for each cpp compiled, around 300 headers must be opened and parsed, some many times. (There are many duplicates in the include tree.) There are guards, but the files are still opened. Each cpp is separately compiled with gcc, so there's no header caching.

To be sure no one misunderstands, I certainly advocate using forward declarations where possible. My employer, however, has banned them. I'm trying to argue against that position.

Thank you for any information.

+5  A: 

Forward declarations can make for neater more understandable code which HAS to be the goal of any decision surely.

Couple that with the fact that when it comes to classes its quite possible for 2 classes to rely upon each other which makes it a bit hard to NOT use forward declaration without causing a nightmare.

Equally forward declaration of classes in a header means that you only need to include the relevant headers in the CPPs that actually USE those classes. That actually DECREASES compile time.

Edit: Given your comment above I would point out it is ALWAYS slower to include a header file than to forward declare. Any time you include a header you are necessitating a load from disk often only to find out that the header guards mean that nothing happens. That would waste immense amounts of time and is really a VERY stupid rule to be bringing in.

Edit 2: Hard data is pretty hard to obtain. Anecdotally, I once worked on a project that wasn't strict about its header includes and the build time was roughly 45 minute on a 512MB RAM P3-500Mhz (This was a while back). After spending 2 weeks cutting down the include nightmare (By using forward declarations) I had managed to get the code to build in a little under 4 minutes. Subsequently using forward declarations became a rule whenever possible.

Edit 3: Its also worth bearing in mind that there is a huge advantage from using forward declarations when it comes to making small modifications to your code. If headers are included all over the shop then a modification to a header file can cause vast amounts of files to be rebuilt.

I also note lots of other people extolling the virtues of pre-compiled headers (PCHs). They have their place and they can really help but they really shouldn't be used as an alternative to proper forward declaration. Otherwise modifications to header files can cause issues with recompilation of lots of files (as mentioned above) as well as triggering a PCH rebuild. PCHs can provide a big win for things like libraries that are pre-built but they are no reason not to use proper forward declarations.

Goz
How do forward declarations make for neater code? I'd argue that they significantly obfuscate code and hide dependencies, making it far more difficult to understand code.
James McNellis
@James: Well it depends on whether the forward declaration is just marking a function that is called later in the same file (in this case it can mean you can structure your code far more sensibly by grouping functions together that otherwise would have interdependency nightmares). That makes code neater, IMO.
Goz
That last paragraph touches what I'm after. I expect the decrease to be quite significant, but I'd like some hard data to back that. I was hoping to obtain some rather than making my own.
JoshD
Edit2 is fantastic. Granted it's anecdotal, but that's still better than nothing. Edit1: I completely agree.
JoshD
*Any time you include a header you are necessitating a load from disk often only to find out that the header guards mean that nothing happens.* Not true. If the header has already been loaded for that TU, chances are good it's in the OS-level filesystem cache; and even if not, compilers can recognize headers that use include guards and optimize that case.
Roger Pate
I suspect you're talking about msvc's pch model (which is per-TU) that differs from gcc's (which is per header).
Roger Pate
@Roger: Even if its pre-compiled per TU a change to a header file that is included somewhere in the massive chain of header file nightmares will surely cause a re-build of that TU? And as its included all over the place it will trigger rebuilds of multiple TUs ... surely? Equally I've always wondered why they don't just cache the header guard details but it seems, to me, that they don't. Equally though, there will necessarily be a whole load of associated processing if it doesn't get header guarded out. Much of it unnecessary.
Goz
@Goz: Yes, that's how msvc's PCHs work: the set of pre-compiled headers is the same for all TUs, and all TUs must use the single PCH. Yes, changing anything included in that "PCH bundle" means you have to recompile every TU. This is why I'm thinking you're talking about msvc's pch model. Gcc has [cached header guards](http://gcc.gnu.org/onlinedocs/cppinternals/Guard-Macros.html) for years, and I've heard the latest version(s) of msvc can also do that. (I guess "per-TU" could be interpreted various ways; should've been more clear about that.)
Roger Pate
So are you saying that changing a header in the PCH bundle doesn't cause ltos of rebuilds under GCC? Because thats what I'm trying, maybe unsuccessfully, to say :) As for cached header guards. Fair enough .. however its still going to be slower (marginally) to check against a cache than to not do so at all ...
Goz
@Goz: (I don't get the notification without @Roger, btw.) Gcc's pre-compiled header model *doesn't have* a pch bundle: each header is independently compiled and cached, then the cached version is checked when the header is included anywhere. (I'm not saying gcc's model is better than msvc's or vice versa; both have advantages.) You can't compare checking a cache against not doing anything; you have to compare checking a cache against manually keeping that information yourself. (And which of those two makes more sense depends on the project/situation specifics.)
Roger Pate
@Roger: Didn't know about the @ thing btw. I just usually do it because it makes comment discussions easier to follow :) Fair enough on the GCC header files. I assumed that it would compile a whole set of headers (ie header A includes headers B, C, D which include E, F, G, etc). Definitely interesting to know. Still, though, even using the pre-compiled header will be slower than not including the header at all ...
Goz
@Goz: I think we're going in circles, comparing isn't useful without equivalents. Yes, it can sometimes be faster to use a declaration instead, but the trade-off is manual synchronization, etc. and that trade-off can be worthwhile sometimes (as I said in my last comment). However, I'll end with at least two standard library headers for which I believe including is always faster: [ciso646](http://bitbucket.org/rdpate/stdtags/src/03766f859aa5/c++03/ciso646) and [iso646.h](http://bitbucket.org/rdpate/stdtags/src/03766f859aa5/c++03/iso646.h). (A conforming implementation only needs empty files. ;)
Roger Pate
+1  A: 

Uhmm, the question is so unclear. And it depends, to be simple.

In an arbitrary scenario i think translation units will not become shorter and easier to compile. The most regarded intent of forward-declarations is to provide convinience to the programmer.

Keynslug
+1  A: 
#include "myClass.h"

is 1..n lines

class myClass;

is 1 line.

You will save time unless all your headers are 1 liners. As there is no impact on the compilation itself (forward reference is just way to say to the compiler that a specific symbol will be defined at link time, and will be possible only if the compiler doesnt need data from that symbol (data size for example)), the reading time of the files included will be saved everytime you replace one by forward references. There's not a regular measure for this as it is a per project value, but it is a recommended practice for large c++ projects (See Large-Scale C++ Software Design / John Lakos for more info about tricks to manage large projects in c++ even if some of them are dated)

Another way to limit the time passed by the compiler on headers is pre-compiled headers.

Matthieu
There is only a *very loose* relationship between LOC and time to compile. Very, *very* loose.
Roger Pate
+4  A: 

Have a look in John Lakos's excellent Large Scale C++ Design book -- I think he has some figures for forward declaration by looking at what happens if you include N headers M levels deep.

If you don't use forward declarations, then aside from increasing the total build time from a clean source tree, it also vastly increases the incremental build time because header files are being included unnecessarily. Say you have 4 classes, A, B, C and D. C uses A and B in its implementation (ie in C.cpp) and D uses C in its implementation. The interface of D is forced to include C.h because of this 'no forward declaration' rule. Similarly C.h is forced to include A.h and B.h, so whenever A or B is changed, D.cpp has to be rebuilt even though it has no direct dependency. As the project scales up this means that if you touch any header it'll have a massive effect on causing huge amounts of code to be rebuilt that just doesn't need to be.

To have a rule that disallows forward declaration is (in my book) very bad practice indeed. It's going to waste huge amounts of time for the developers for no gain. The general rule of thumb should be that if the interface of class B depends on class A then it should include A.h, otherwise forward declare it. In practice 'depends on' means inherits from, uses as a member variable or 'uses any methods of'. The Pimpl idiom is a widespread and well understood method for hiding the implementation from the interface and allows you to vastly reduce the amount of rebuilding needed in your codebase.

If you can't find the figures from Lakos then I would suggest creating your own experiments and taking timings to prove to your management that this rule is absolutely wrong-headed.

the_mandrill
Note that _Large Scale C++ Design_ was published in 1996. There have been huge improvements in compiler performance since then (most notably, I don't think precompiled headers were supported by most compilers in 1996).
James McNellis
Thank you very much. This is quite helpful.
JoshD
@James: yes, precompiled headers and multithreaded/parallelising compilers have moved on a long way, but also our company's codebase has also vastly increased in size since 1996. I think the core tenets of the book are as relevant today as they were back then.
the_mandrill
+1  A: 
Roger Pate
Thank you very much for your informative response.
JoshD