I know I can profile my code with gprof
and kprof
on Linux. Is there a comparable alternative to these applications on Windows?
views:
542answers:
4There's a MinGW port of gprof that works just about the same as the Linux variant. You can either get a full MinGW installation (I think gprof is included but not sure) or get gprof from the MinGW binutils package.
For Eclipse, there's TPTP but it doesn't support profiling C/C++ as far as I know.
Commercial software:
- Rational Quantify (expensive, slow, but very detailed)
- AQTime (less expensive, less slow, a bit detailed)
Free software:
- Very sleepy (www.codersnotes.com)
- Luke StackWalker (somewhere on SourceForge)
These commercial alternatives change the compiled code by 'instrumenting' (adding instructions) to it and perform the timing withing the added instructions. This means that they cause your application to slow down seriously.
These free alternatives use sampling, meaning they are less detailed, but very fast. In practice I found that especially Very Sleepy is very good to have a quick look at performance problems in your application.
What's the reason for profiling? Do you want to a) measure times and get a call graph, or b) find things to change to make the code faster? (These are not the same.)
If (b) you can use this trick, using the Pause button in Eclipse.
Added: Maybe it would help to convey some experience of what performance problems are actually like, and where you can expect to find them. Here are some simple examples:
An insertion sort (order n^2) where the items being sorted are strings, and are compared by a string-compare function. Where is the hot-spot? in string-compare. Where is the problem? In the sort where string-compare is called. If n=10 it's not a problem, but if n=1000, suddenly it takes a long time. The point where string-compare is called is "cold", but that's where the problem is. A small number of samples of the call stack pinpoint it with certainty.
An app that loads plugins takes a long time to start up. A profiler says basically everything in it is "cold". Something that measures I/O time says it is almost all I/O time, which seems like what you might expect, so it might seem hopeless. But, stack samples show a large percentage of time is spent with the stack about 20 layers deep in the process of reading the resource part of plugin dlls for the purpose of translating string constants into the local language. Investigating further, you find that most of the strings being translated are not the the kind that actually need translation. They were just put in "in case" they might need translation, and were never thought to be something that could cause a performance problem. Fixing that issue brings a hefty time savings.
So it is common to think in terms of "hotspots" and "bottlenecks", but most programs, especially the larger ones, tend to have performance problems in the form of function calls requesting work that doesn't really need to be done. Fortunately they display themselves on the call stack during the time that they are spending.