views:

77

answers:

2

All too often I read statements about some new framework and their "benchmarks." My question is a general one but to the specific points of:

  1. What approach should a developer take to effectively instrument code to measure performance?

  2. When reading about benchmarks and performance testing, what are some red-flags to watch out for that might not represent real results?

+1  A: 

It depends what you're trying to do.

1) If you want to maintain general timing information, so you can be alert to regressions, various instrumenting profilers are the way to go. Make sure they measure all kinds of time, not just CPU time.

2) If you want to find ways to make the software faster, that is a distinctly different problem.
You should put the emphasis on the find, not on the measure.

  • For this, you need something that samples the call stack, not just the program counter (over multiple threads, if necessary). That rules out profilers like gprof.

  • Importantly, it should sample on wall-clock time, not CPU time, because you are every bit as likely to lose time due to I/O as due to crunching. This rules out some profilers.

  • It should be able to take samples only when you care, such as not when waiting for user input. This also rules out some profilers.

  • Finally, and very important, is the summary you get. It is essential to get per-line percent of time. The percent of time used by a line is the percent of stack samples containing the line. Don't settle for function-only timings, even with a call graph. This rules out still more profilers. (Forget about "self time", and forget about invocation counts. Those are seldom useful and often misleading.)

Accuracy of finding the problems is what you're after, not accuracy of measuring them. That is a very important point. (You don't need a large number of samples, though it does no harm. The harm is in your head, making you think about measuring, rather than what is it doing.)

One good tool for this is RotateRight's Zoom profiler. Personally I rely on manual sampling.

Mike Dunlavey
+2  A: 

There are two methods of measuring performance: using code instrumentation and using sampling.

The commercial profilers (Hi-Prof, Rational Quantify, AQTime) I used in the past used code instrumentation (some of them could also use sampling) and in my experience, this gives the best, most detailed result. Especially Rational Quantity allow you to zoom in on results, focus on sub trees, remove complete call trees to simulate an improvement, ...

The downside of these instrumenting profilers is that they:

  • tend to be slow (your code runs about 10 times slower)
  • take quite some time to instrument your application
  • don't always correctly handle exceptions in the application (in C++)
  • can be hard to set up if you have to disable the instrumentation of DLL's (we had to disable instrumentation for Oracle DLL's)

The instrumentation also sometimes skews the times reported for low-level functions like memory allocations, critical sections, ...

The free profilers (Very Sleepy, Luke Stackwalker) that I use use sampling, which means that it is much easier to do a quick performance test and see where the problem lies. These free profilers don't have the full functionality of the commercial profilers (although I submitted the "focus on subtree" functionality for Very Sleepy myself), but since they are fast, they can be very useful.

At this time, my personal favorite is Very Sleepy, with Luke StackWalker coming second.

In both cases (instrumenting and sampling), my experience is that:

  • It is very difficult to compare the results of profilers over different releases of your application. If you have a performance problem in your release 2.0, profile your release 2.0 and try to improve it, rather than looking for the exact reason why 2.0 is slower than 1.0.
  • You must never compare the profiling results with the timing (real time, cpu time) results of an application that is run outside the profiler. If your application consumes 5 seconds CPU time outside the profiler, and when run in the profiler the profiler reports that it consumes 10 seconds, there's nothing wrong. Don't think that your application actually takes 10 seconds.
  • That's why you must consistently check results in the same environment. Consistently compare results of your application when run outside the profiler, or when run inside the profiler. Don't mix the results.
  • Also use a consistent environment and system. If you get a faster PC, your application could still run slower, e.g. because the screen is larger and more needs to be updated on screen. If moving to a new PC, retest the last (one or two) releases of your application on the new PC so you get an idea on how times scale to the new PC.
  • This also means: use fixed data sets and check your improvements on these datasets. It could be that an improvement in your application improves the performance of dataset X, but makes it slower with dataset Y. In some cases this may be acceptible.
  • Discuss with the testing team what results you want to obtain beforehand (see Oded's answer on my own question http://stackoverflow.com/questions/2341034).
  • Realize that a faster application can still use more CPU time than a slower application, if the faster one uses multi-threading and the slower one doesn't. Discuss (as said before) with the testing time what needs to be measured and what doesn't (in the multi-threading case: real time instead of CPU time).
  • Realize that many small improvements may lead to one big improvement. If you find 10 parts in your application that each take 3% of the time and you can reduce it to 1%, your application will be 20% faster.
Patrick
+1 for a thorough and thoughtful answer. I must contrast your last point with my experience (http://stackoverflow.com/questions/926266/performance-optimization-strategies-of-last-resort/927773#927773). Users of most profilers are happy to get small improvements like 20%, when much larger factors are often possible. To get those larger factors, IME it is most effective to use the kind of sampling I outlined.
Mike Dunlavey