views:

209

answers:

2

My understanding is that by default gprof takes into account CPU time. Is there a way to get it to profile based on wall-clock time?

My program does a lot of disk i/o, so the CPU time it uses only represents a fraction of the actual execution time. I need to know which portions of the disk i/o take up the most time.

+1  A: 

gprof won't do this. Look at this.

And this.

In a nutshell: Under gdb, get it running and do Ctrl-Break or Ctrl-C 10 times at random, and display the call stack. If your I/O is taking (for example) 60% of the time, then on (roughly) 6 out of 10 pauses, you will see it in the writebuf or readbuf routine, and the lines of code requesting that I/O will be clearly displayed on the stack.

You could also use lsstack to get the same information.

Mike Dunlavey
Hmm... wouldn't this method be very statistically inaccurate? Is there an automated way of doing this, which takes much more than 10 samples, say 1000 samples, but at uniform intervals, and then reports which functions were encountered most often?
jetwolf
@jetwolf: Zoom is an example of a profiler that does it with 10^3 samples, but check the first link, especially items 5, 2, 7, and 9.
Mike Dunlavey
@jetwolf: Example: suppose I/O is exactly 60%. Standard deviation of number of samples to show it is sqrt(NF(1-F)). For 10 samples that is +/- 1.55, for 1000, it is 15.5. So in 10 samples you will see it roughly 4.45 - 7.55 times. In 1000 samples you will see it roughly 584.5 - 615.5 times. Either way, you'll see exactly what's causing it so if it's fixable, you can fix it.
Mike Dunlavey
@jetwolf: Another way to put it. Suppose I/O is exactly 60%. 10 samples would measure 60%, give or take 15%. For 1000 samples, it would be 1.5%. For 100,000 samples it would be 0.15%. So they all measure it, but they only show you the cause of the problem if they summarize at the level of lines on the call stack, and profilers that take lots of samples tend to throw that information away, so the extra measurement precision comes at the expense of finding the problem.
Mike Dunlavey
+1  A: 

You can use strace or cachegrind to profile the code properly. strace will give you details of time spent in system calls and cachegrind will give detailed analysis of resource utilization.

Rohit