tags:

views:

99

answers:

2

I have a Perl script that takes about 30 minutes to run, so of course I run Devel::NYTProf. Great profiler. For many of my subs, I'm getting some data that doesn't make sense to me.

I'm running with perl 5.10.0 on Linux using the default NYTProf settings.

In the HTML output, each of the subs has a summary section stating how much time is spent in the sub and its children and then goes on to give me line information.

The line statistics don't add up to the total spent in the function. What gives?

For example, I have a function that's reported to use 233s (57+166). The line-by-line number report has one line that uses 20s, another that uses 4 and one that uses 2. The other lines are <1s and the function is not that long.

What can I do to resolve this mismatch?

I could move to Perl 5.12 but that would take some work to install the dependencies. I'm happy to run it in a slower mode. Is there a way to increase the sampling frequency? Run on a slower machine?

Click here for a sample: my NYTProf output. In this case, the sub is reported to use 225 seconds, but adding all of the numbers yields 56 seconds. This run had optimization turned off:

setenv NYTPROF optimize=0:file=nytprof.optout

Update I've rerun with Perl 5.12 using the findcaller=1 option flag as suggested with more or less the same results. (I ran on a different dataset)

+2  A: 

Try disabling perl's optimizer. From CPAN's Devel::NYTProf docs:

optimize=0

Disable the perl optimizer.

By default NYTProf leaves perl's optimizer enabled. That gives you more accurate profile timing overall, but can lead to odd statement counts for individual sets of lines. That's because the perl's peephole optimizer has effectively rewritten the statements but you can't see what the rewritten version looks like.

For example:

1     if (...) {   
2         return;
3     } may be rewritten as

1    return if (...)

so the profile won't show a statement count for line 2 in your source code because the return was merged into the if statement on the preceding line.

Using the optimize=0 option disables the optimizer so you'll get lower overall performance but more accurately assigned statement counts.

If you find any other examples of the effect of optimizer on NYTProf output (other than performance, obviously) please let us know.

Pedro Silva
@Pedro Silva: I've added a link to some of my output to the question. Thanks for you time. I reran with the optimize flag on.
mmccoo
So everything seems to check out: 'Time on line' for `sub add_bit_to_map` is probably exclusive time (excludes time spent on calls to libraries, etc). When you sum 56s across major lines in that sub, plus some change from the micro-second lines, it seems to add up to about 57.2s, which is the exact exclusive time spent on that sub, according to NYTProf.
Pedro Silva
+2  A: 

I've just added this to the NYTProf documentation:

=head2 If Statement and Subroutine Timings Don't Match

NYTProf has two profilers: a statement profiler that's invoked when perl moves from one perl statement to another, and a subroutine profiler that's invoked when perl calls or returns from a subroutine.

The individual statement timings for a subroutine usually add up to slightly less than the exclusive time for the subroutine. That's because the handling of the subroutine call and return overheads is included in the exclusive time for the subroutine. The difference may only be a new microseconds but that may become noticeable for subroutines that are called hundreds of thousands of times.

That probably explains the difference between the sum of the statement time column (31.7s) and the exclusive time reported for the subroutine (57.2s). The difference amounts to approximately 100 microseconds per call (which seems a little high, but not unreasonably so).

The statement profiler keeps track how much time was spent on overheads, like writing statement profile data to disk. The subroutine profiler subtracts the difference in overheads between entering and leaving the subroutine in order to give a more accurate profile.

The statement profiler is generally very fast because most writes get buffered for zip compression so the profiler overhead per statement tends to be very small, often a single 'tick'. The result is that the accumulated overhead is quite noisy. This becomes more significant for subroutines that are called frequently and are also fast (in this case 250303 calls at 899µs/call). So I suspect this is another, smaller, contribution to the discrepancy between statement time and exclusive times.

A bigger concern to me, as the author, is the difference between the reported time spent calling other subroutines (168s) and the sum of the calls that can be seen in the report (24.6s).

I suspect what's happening here is that calls to other subroutines are being made but NYTProf isn't able to determine the calling location correctly so they don't appear in the report in the correct place. (This is a common problem with older perl versions. Perl 5.10.1 should be a trivial drop-in upgrade over 5.10.0 as it's binary compatible.)

I will investigate and see if there's a way to identify and report the details of the 'missing' calls. Meanwhile you could try the findcaller=1 option. That may help (though it may also cause other issues).

The overall subroutine timing is accurate and should be trusted more than the sum of statement or nested sub call timings.

I'd also recommend asking questions about NYTProf on the mailing list.

Tim Bunce