views:

881

answers:

4

Anyone know this compiler feature? It seems GCC support that. How does it work? What is the potential gain? In which case it's good? Inner loops?

(this question is specific, not about optimization in general, thanks)

A: 

The fun thing about optimization is that speed gains are found in the unlikeliest of places.

It's also the reason you need a profiler, rather than guessing where the speed problems are.

I recommend starting with a profiler (gperf if you're using GCC) and just start poking around the results of running your application through some normal operations.

Jason Cohen
+1  A: 

Jason's advise is right on. The best speedups you are going to get come from 'discovering' that you let an O(n^2) algorithm slip into an inner loop somewhere, or that you can cache certain computations outside of expensive functions.

Compared to the micro-optimizations that PGO can trigger, these are the big winners. Once you've done that level of optimization PGO might be able to help. We never had much luck with it though - the cost of the instrumentation was such that our application become unusably slow (by several orders of magnitude).

I like using Intel VTune as a profiler primarily because it is non-invasive compared to instrumenting profilers which change behaviour too much.

Rob Walker
+4  A: 

It works by placing extra code to count the number of times each codepath is taken. When you compile a second time the compiler uses the knowledge gained about execution of your program that it could only guess at before. There are a couple things PGO can work toward:

  • Deciding which functions should be inlined or not depending on how often they are called.
  • Deciding how to place hints about which branch of an "if" statement should be predicted on based on the percentage of calls going one way or the other.
  • Deciding how to optimize loops based on how many iterations get taken each time that loop is called.

You never really know how much these things can help until you test it.

Greg Rogers
thank you, that seems interesting. do you have any documentation link to share?
elmarco
The branching prediction you mentioned is not used and is totally useless, because the branch hit prefixes are only used on the first time that a branch is encountered by the CPU. The real benefit is GCC knowing how it can structure complex branchy code best based on prediction probabilities.
Dark Shikari
As asked by the OP, it would be great if you could extend your answer with information on how this can be used. (eg. specific options etc etc.)
Richard Corden
+1  A: 

PGO gives about a 5% speed boost when compiling x264, the project I work on, and we have a built-in system for it (make fprofiled). Its a nice free speed boost in some cases, and probably helps more in applications that, unlike x264, are less made up of handwritten assembly.

Dark Shikari