views:

516

answers:

3

I am using google's perftools (http://google-perftools.googlecode.com/svn/trunk/doc/cpuprofile.html) for CPU profiling---it's a wonderful tool that has helped me perform a great deal of CPU-time improvements on my application.

Unfortunately, I have gotten to the point that the code is still a bit slow, and when compiled using g++'s -O3 optimization level, all I know is that a specific function is slow, but not which aspects of it are slow.

If I remove the -O3 flag, then the unoptimized portions of the program overtake this function, and I don't get a lot of clarity into the actual parts of the function that are slow. If I leave the -O3 flag in, then the slow parts of the function are inlined, and I can't determine which parts of the function are slow.

Any suggestions? Thanks for your help!

+6  A: 

For something like this, I've always used the "old school" way of doing it:

Insert into the routine you want to measure at various points statements which measure the current time (or cputime). Then simply print out or log the differences between them and you will know how long each section of code took. From there you can find out what is eating most of the time, and go in and get fine-grained timing within that section until you know what the problem is, and how to fix it.

If the overhead of the function calls is not the problem, you can also force inlining to be off with -fno-inline-small-functions -fno-inline-functions -fno-inline-functions-called-once -fno-inline (I'm not exactly sure how these switches interact with each other, but I think they are independent). Then you can use your ordinary profiler to look at the call graph profile and see what function calls are taking what amount of time.

Greg Rogers
Thanks Greg! If it had not been for oprofile (below), I think I would have gone for the precision timing idea that you suggested.
Adam
+5  A: 

If you're on linux, use oprofile. If you're on Windows, use AMD's CodeAnalyst.

Both will give sample-based profiles down to the level of individual source lines or assembly instructions and you should have no problem identifying "hot spots" within functions.

timday
I can't speak for CodeAnalyst, but oprofile is amazing! the opannotate command gave me source-line annotation just as you described. Thanks!
Adam
CodeAnalyst is a GUI over a specialized version of oprofile. You can also use CodeAnalyst in Linux.
Carlos
+1  A: 

I've spent decades doing performance tuning.

People love their tools, but I swear by this method.

Mike Dunlavey