views:

367

answers:

8

I am building a lot of auto-generated code, including one particularly large file (~15K lines), using a mingw32 cross compiler on linux. Most files are extremely quick, but this one large file takes an unexpectedly long time (~15 minutes) to compile.

I have tried manipulating various optimization flags to see if they had any effect, without any luck. What I really need is some way of determining what g++ is doing that is taking so long. Are there any (relatively simple) ways to have g++ generate output about different phases of compilation, to help me narrow down what the hang-up might be?

Sadly, I do not have the ability to rebuild this cross-compiler, so adding debugging information to the compiler and stepping through it is not a possibility.

What's in the file:

  • a bunch of includes
  • a bunch of string comparisons
  • a bunch of if-then checks and constructor invocations

The file is a factory for producing a ton of different specific subclasses of a certain parent class. Most of the includes, however, are nothing terribly fancy.


The results of -ftime-report, as suggested by Neil Butterworth, indicate that the "life analysis" phase is taking 921 seconds, which takes up most of the 15 minutes.

It appears that this takes place during data flow analysis. The file itself is a bunch of conditional string comparisons, constructing an object by class name provided as a string.

We think changing this to point into a map of names to function pointers might improve things a bit, so we're going to try that.


Indeed, generating a bunch of factory functions (per object) and creating a map from the string name of the object to a pointer to its factory function reduced compile time from the original 15 minutes to about 25 seconds, which will save everyone tons of time on their builds.

Thanks again to Neil Butterworth for the tip about -ftime-report.

+3  A: 

It most probably includes TONNES of includes. I believe -MD will list out all the include files in a given CPP file (That includes includes of includes and so forth).

Goz
++ Yes. Nested includes and templates can send the preprocessor off on a merry ride. At least the includes can be handled with precompiled headers.
Mike Dunlavey
+2  A: 

What slows g++ down in general are templates. For example Boost loves to use them. This means nice code, great performances but poor compiling speed.

On the other hand, 15min seems extremely long. After a quick googling, it seems that it is a common problem with mingw

Tristram Gräbener
Not just mingw. I've seen Linux gcc and Visual Studio behave just as bad. Then again, it was the kind of source packages compiled only once, so the compilers didn't have the oportunity to speed it up.
luiscubal
@Tristram What was your google search string?
Schamp
+12  A: 

Won't give all the details you want, but try running with the -v (verbose) and -ftime-report flags. The latter produces a summary of what the compiler has been up to.

anon
-ftime-report was the clue I needed.
Schamp
A: 

Another process to try is to add "progress marker" pragmas to your code to trap the portion of the code that is taking a long time. The Visual Studio compiler provides #pragma message(), although there is not a standard pragma for doing this.

Put one marker at the beginning of the code and a marker at the end of the code. The end marker could be a #error since you don't care about the remainder of the source file. Move the markers accordingly to trap the section of code taking the longest time.

Just a thought...

Thomas Matthews
A: 

I'd use #if 0 / #endif to eliminate large portions of the source file from compilation. Repeat with different blocks of code until you pinpoint which block(s) are slow. For starters, you can see if your #include's are the problem by using #if 0 / #endif to exclude everything but the #include's.

Josh Kelley
A: 

Related to @Goz and @Josh_Kelley, you can get gcc/g++ to spit out the preprocessed source (with #includes inline) using -E. That's one way to determine just how large your source is.

And if the compiler itself is the problem, you may be able to strace the compile command that's taking a long time to see whether there's a particular file access or a specific internal action that's taking a long time.

Dave Bacher
A: 

What the compiler sees is the output of the pre-processor, so the size of the individual source is not a good measure, you have to consider the source and all the files it includes, and the files they include etc. Instantiation of templates for multiple types generates code for each separate type used, so that could end up being a lot of code. If you have made extensive used of STL containers for many classes for example.

15K lines in one source is rather a lot, but even if split up, all that code still needs to be compiled; however using an incremental build may mean that it does not all need compiling all the time. There really is no need for a file that large; its just poor practice/design. I start thinking about better modularisation when a file gets to 500 lines (although I am not dogmatic about it)

Clifford
A: 

One thing to watch during the compile is how much memory your computer has free. If the compiler allocates so much memory that the computer starts swapping, compile time will go way, way up.

If you see that happen, an easily solution is to install more RAM... or just split the file into multiple parts that can be compiled separately.

Jeremy Friesner