views:

24

answers:

1

I'd like perf to output raw sample counts rather than percentages. This is useful for determining whether I've sped up a function I'm trying to optimize.

To be clear, I'd like to do something like

perf record ./a.out
perf report

and see how many times perf sampled each function in a.out.

Shark can do this on Mac, as can (I believe) Xperf. Is this possible on Linux with perf?

A: 

You want to see if your changes to a function made a difference. I presume you also want whatever help you can get in finding out which function you need to change. Those two objectives are not the same.

Many tools give you as broad a set of statistics or counters as they can dream up, as if having more statistics will help either goal.

Can you get hold of RotateRight/Zoom, or any tool that gives you stack samples on wall-clock time, preferably under user control? Such a tool will give you time and percent spent in any routine or line of code, in particular inclusive time.

The reason inclusive time is so important is that every single line of code that is executed is responsible for a certain fraction of time, such that if the line were not there, that fraction of time would not be spent, and overall time would be reduced by that fraction. During that fraction of time, whether it is spent in one big chunk or thousands of little chunks, that line of code is on the call stack, where stack samples will spot it, at a rate equal to its fraction. That is why stack sampling is so effective in finding code worth optimizing, whether it consists of leaf instructions or calls in the call tree.

Personally, this link gives the how and why of the method I use, which is not fancy, but is as or more effective than any method or tool I've seen. Here's a discussion.

Mike Dunlavey
perf annotate will show you which of your instructions are slow. But again, it's only showing you percentages, not raw sample counts.
Justin L.
@Justin L.: Whether "slow" or "fast", if it is a function call instruction, what it will save you if removed is the percent of time it is on the stack. If it is not a function call instruction, what it will save if removed is the percent of time it is being executed (i.e. at the top of the stack). I don't understand what raw sample counts actually tell you. I do understand what raw random-time stack samples tell you. They tell you precisely where to concentrate your optimizing, and approximately how much you can expect to save, and you don't need very many of them.
Mike Dunlavey
@Justin L.: That's what Zoom does extremely well.
Mike Dunlavey
@Mike Dunlavey, I think you may be misunderstanding my question. If I profile function A and measure a% of execution time spent in A, then I modify A, creating function B, and measure b% of execution time spent in B, it's nontrivial to figure out how much faster B is than A, since by making A faster, I've decreased the total execution time! But if I know that we saw 100 samples in A and only 80 in B, it's easy.
Justin L.
I'm not interested in using a different tool to accomplish this. I'm aware of other tools that do what I want -- Shark, for instance. The question is just whether perf can do this. Sounds like your answer is no, which is perfectly fine, and probably right. :)
Justin L.