views:

288

answers:

3

short version:

Is there a good time based sampling profiler for Linux?

long version:

I generally use OProfile to optimize my applications. I recently found a shortcoming that has me wondering.

The problem was a tight loop spawning c++filt to demangle a c++ name. I only stumbled upon the code by accident while chasing down another bottleneck. The OProfile didn't show anything unusual about the code so I almost ignored it but my code sense told me to optimize the call and see what happened. I changed the popen of c++filt to abi::__cxa_demangle. The runtime went from more than a minute to a little over a second. About a x60 speed up.

Is there a way I could have configured OProfile to flag the popen call? As the profile data sits now OProfile thinks the bottle neck was the heap and std::string calls (which BTW once optimized dropped the runtime to less than a second, more than x2 speed up).

Here is my OProfile configuration:

$ sudo opcontrol --status
Daemon not running
Event 0: CPU_CLK_UNHALTED:90000:0:1:1
Separate options: library
vmlinux file: none
Image filter: /path/to/executable
Call-graph depth: 7
Buffer size: 65536

Is there another profiler for Linux that could have found the bottleneck?

I suspect the issue is that OProfile only logs its samples to the currently running process. I'd like it to always log its samples to the process I'm profiling. So if the process is currently switched out (blocking on IO or a popen call) OProfile would just place its sample at the blocked call.

If I can't fix this, OProfile will only be useful when the executable is pushing near 100% CPU. It can't help with executables that that have inefficient blocking calls.

+3  A: 

Glad you asked. I believe OProfile can be made to do what I consider the right thing, which is to take stack samples on wall-clock time when the program is being slow and, if it won't let you examine individual stack samples, at least summarize for each line of code that appears on samples, the percent of samples the line appears on. That is a direct measure of what would be saved if that line were not there. Here's one discussion. Here's another, and another. And, as Paul said, Zoom should do it.

If your time went from 60 sec to 1 sec, that implies every single stack sample would have had a 59/60 probability of showing you the problem.

Mike Dunlavey
Mike, your point is very valid, I agree the technique with 100%. Any ideas on how to enable time based sampling via OProfile or in a more automated approach than just breaking in the debugger?
caspin
@Caspin: I'm on Windows, and I'm not a user of OProfile, but this link (http://oprofile.sourceforge.net/doc/opreport.html) talks about its use and presentation of stack sample data. Also this link (http://oprofile.sourceforge.net/doc/detailed-parameters.html#timer) talks about timer interrupts. I can't tell if it will take samples during I/O or other blocking calls.
Mike Dunlavey
... note that the frequency of sampling does not need to be fast, but it does need to be able to sample during blocking calls unless you want to be blind to those.
Mike Dunlavey
... I forgot to mention, **lsstack** is an easy way to get stack samples manually, as is **pstack** (but without symbols).
Mike Dunlavey
+2  A: 

Try Zoom - I believe it will let you profile all processes - it would be interesting to know if it highlights your problem in this case.

Paul R
Zoom version 1.6.6 does not find the problem either. The next version of Zoom will supposedly have a sampling mode (*"thread time profiling"*) that could find the problem.
caspin
+1  A: 

Quickly hacked up trivial sampling profiler for linux: http://vi-server.org/vi/simple_sampling_profiler.html

It appends backtrace(3) to a file on SIGUSR1, and then converts it to annotated source.

Vi
Take a look at lsstack is implemented. It doesn't need special driver code of get the current backtrace. Also put your code out on bitbucket or google code. If you get a decently done project I'll contribute bug fixes as I use it.
caspin
@caspin, OK, Now I'm looking how to make use of gdb (http://stackoverflow.com/questions/3999464/how-to-make-gdb-get-stacktrace-repeatably) to do the same thing.
Vi