views:

339

answers:

3

Has anyone seen any real world numbers for different programs which are using the feedback optimization that C/C++ compilres offer to support the branch prediction, cache preloading functions etc.

I searched for it and amazingly not even the popular interpreter development groups seem to have checked the effect. And increasing ruby,python,php etc. performance by 10% or so should be considered usefull.

Is there really no benefit or is the whole developer community just to lazy to use it?

+5  A: 

10% is a good ballpark figure. That said, ...

You have to REALLY care about the performance to go this route. The product I work on (DB2) uses PGO and other invasive and agressive optimizations. Among the costs are significant build time (triple on some platforms) and development and support nightmares.

When something goes wrong it can be non-trivial to map the fault location in the optimized code back to the source. Developers don't usually expect that functions in different modules can end up merged and inlined and this can have "interesting" effects.

Problems with pointer aliasing, which are nasty to track down also usually show up with these sorts of optimizations. You have the additional fun of having non-deterministic builds (an aliasing problem can show up in monday's build, vanish again till thursday's, ...).

The line between what is correct or incorrect compiler behaviour under these sorts of aggressive optimizations also becomes fairly blurred. Even with the luxury of having our compiler guys in house (literally) the optimization issues (either in our source or the compiler) are still not easy to understand and resolve.

Peeter Joot
This. PGO is the mortal enemy of rapid iteration, and you can't just leave it off for testing because every once in a while it'll introduce a bug. It's not that there's no benefit, but the perf gains are marginal compared to the development and support costs for most applications.
David Seiler
Well put David. And if you use major release boundaries to make compiler updates be prepared to have these optimizations not work for months after (probably get things working again right near your GA dates;). And be prepared for well-behaved stable service release builds to all of a sudden start misbehaving. And, ...Why this isn't common probably comes down to money. There is a big staffing expense for use of PGO in a product.
Peeter Joot
Note that PGO usually doesn't _introduce_ bugs but rather _exposes_, _finds_ or _trips over_ bugs.
MSalters
Humph... That's a lot of fuss for a lousy 10%. http://stackoverflow.com/questions/926266/performance-optimization-strategies-of-last-resort/927773#927773
Mike Dunlavey
+1  A: 

Traditional methods in improving the compiler efficiency via profiling is done by performance analysis tools. However, how the data from the tools may be of use in optimization still depends on the compiler you use. For example, GCC is a framework being worked on to produce compilers for different domains. Providing profiling mechanism in the such compiler framework will be extremely difficult.

We can rely on statistical data to do certain optimization. For instance, GCC unrolls a loop if the loop count is less than a constant (say 7). How it fixes up the constant will be based on statistical result of the code size generated for different target architecture.

Profile guided optimizations track the special areas of the source. Details regarding previous run results needs to be stored which is an overhead. The input on the other hand requires a statistical representation of the target application which may use the compiler. So the complexity level rises with the number of different inputs and outputs. In short, deciding profile guided optimization needs extreme data collection. Automation or embedding such profiling into source needs careful monitoring. If not, the entire result will be awry and in our effort to swim we actually will drown.

However, experimentation on this regard is ongoing. Just have a look at POGO.

Ganesh Gopalasubramanian
+2  A: 

From unladen-swallow (a project optimizing the CPython VM):

For us, the final nail in PyBench's coffin was when experimenting with gcc's feedback-directed optimization tools, we were able to produce a universal 15% performance increase across our macrobenchmarks; using the same training workload, PyBench got 10% slower.

So some people are at least looking at it. That said, PGO sets some pretty tricky requirements on the build environment that are hard to satisfy for open-source projects meant to be built by a distributed heterogeneous group of people. Heavy optimization also creates difficult to debug heisenbugs. It's less work to give the compiler explicit hints for the performance critical parts.

That said, I expect significant performance increases from runtime profile guided optimization. JIT'ing allows the optimizer to cope with the profile of data changing across the execution of a program and do many extremely runtime data specific optimizations that would explode the code size for static compilation. Especially dynamic languages need good runtime data based optimization to perform well. With dynamic language performance getting significant attention lately (JavaScript VM's, MS DLR, JSR-292, PyPy and so on) there's a lot of work being done in this area.

Ants Aasma