ansaurus

Question

How am I supposed to interpret OProfile output?

Answer 1

+1 A:

When profiling optimized code you really cannot rely on accurate source code lines. The compiler moves stuff around far too much.

For an accurate picture you will need to look at the code disassembler output.

Zan Lynx 2010-10-27 15:01:54

Will it make sense if I profile the unoptmized version instead? Or will the bottlenecks discovered therein be totally unrelated to the optimized version? Also, any way to get percentage-annotated disassembler output? ;)

neuviemeporte 2010-10-27 15:07:44

Wow, apparently there's an option to opannotate called --assembly, thanks!

neuviemeporte 2010-10-27 15:09:07

Answer 2

A:

OProfile can (they tell me) get stack samples on wall-clock time (not CPU), and it can give you line-level percentages. What you are looking for is lines that are contained on a large percent of stack samples.

I wouldn't turn on compiler optimization until after I finished hand-tuning the code, because it just hides things.

When you say the interpolate routine uses 84% of the time, that triggers a question. The entire program takes some total time, right? It takes 100% of that time. If you cut the program's time in half, or if you double it, it will still take 100% of the time. Whether 84% for interpolation is too much or not depends on whether it is being done more than necessary.

So I would suggest that you not ask if the percent of a routine is too much. Rather you look for lines of code that take a significant amount of time and ask if they could be optimized. See the difference? After you optimize the code, it can make a large reduction in overall run time, but it might still be a large percent, of a smaller total. Code isn't optimal when nothing takes a large percent. Code is optimal when of all the things that take a large percent, none can be improved.

I don't care for things that just give numbers. What I want is insight. For example, if that routine accounts for 84% of the time, then if you took 10 samples of the stack, it would be on 8.4 of them. The exact number doesn't matter. What matters is to understand why it was in there. Was it really really necessary to be in there so much? That's what looking at the stack samples can tell you. Maybe you're actually doing the interpolation twice as often as necessary? Often people find out, by analyzing the why, that the routine they're trying to speed up didn't need to be called nearly as much, maybe not at all. I can't guess in your case. Only the insight from examining the program's state can tell you that.

Mike Dunlavey 2010-10-27 20:56:44

ansaurus

tags:

views:

answers:

How am I supposed to interpret OProfile output?

related questions