I need to profile a real time C++ app on Windows. Most of the available profilers are either terribly expensive, total overkill, or both. I don't need any .NET stuff. Since it is a real time app, I need the profiler to be as fast as possible. It would be excellent if it integrated in some way with Visual Studio 2005/2008, but that's not necessary. If this description reminds you of a profiler that you have used, I would really like to know about it. I am hoping to draw from people's use of C++ profilers on Windows to pinpoint one that will do the job. Thanks.
views:
346answers:
7When I have to profile realtime code, I think the only solution is something hand-rolled. You don't want too much coverage or you end up slowing the code down, but with a small data set, you need to be very focused, essentially picking each point by hand.
So I wrote a header file several years ago that defines some macros and a mechanism for capturing data, either as function timings or as a timeline (at time T in function X). The code uses QueryPerformanceCounter for the timings and writes the data into named shared memory via CreateFileMapping so that I can look at the timing data from another process live.
It takes a recompile to change what timing information I want to capture, but the code is so inexpensive that It has virtually no effect on the code.
All of the code is in the header file, (with macro guards so the code only gets included once). so the header file itself is my 'profiler'. I change some tables in the header, then and markup the target code, recompile and start profiling.
Give consideration to the no-profiler option.
It is common wisdom regarding performance problems that measuring is a prerequisite to finding them.
Not so. More on that subject.
This is an example of tuning an app for maximum performance.
In case you're worried about overhead and how to do this in a DSP real-time app, here's how I would do it. Get it running with realistic input, and just halt it in its tracks with the pause button. Capture the call stack into Notepad. Then start it up again and repeat several times. (Throw away any sample that is irrelevant, like waiting for user input or otherwise in idle state.) Notice that this process puts zero profiling overhead on the program.
If you have the problem that your code runs off a timer and is actually running for only a small fraction of the overall time, then a) you may decide that you don't actually have a problem, or b) you can still try to make it go faster. If (b) then wrap a loop around your code so that whatever it does, it repeats 10, 100, or 1000 times, making it take a large enough fraction of time so that samples will land in it. Use those samples to find out what to fix to make it faster. When you are done, remove the outer loop, and it will run like a bandit.
Since it is a real time app, I need the profiler to be as fast as possible.
I don't know what you mean by real-time (hard, semi-hard, soft).
I once had to improve the performance of a fax server. The fax protocol is such that if either end delays too long (some tens or hundreds of milliseconds, depending) then the fax session is disconnected. I was therefore unable to use any commercial profiler that was available to me, because they slowed the execution of the server too much: and so instead I added various log messages (with time stamps) to instrument the code and thus find the bottle-necks.
We do a fair amount of profiling, and have used Shark (OSX only), vTune, Glowcode and the old favourite of counters/clocks.
Of those Shark is by far and away the best (and free!), to the extent that I try to keep code portable to OSX so I can use it to profile. Unfortunately, it doesn't meet your requirements.
vTune was wholly unimpressive, it was too complicated to get a decent profile out of without being an expert in what all the profiling options, the front end GUI frequently crashed or just plain broke and its sampler doesn't sample the call stack making it almost useless for actually seeing how bottlenecks in your program are arising. It was also expensive (although we ended up buying a licence). In its favour it is cross platform, and you can get a 30-day trial to see if you like it.
Glowcode was decent, IIRC windows only and also offers a free trial. It's been a while since we used it but it might not be a bad place to start.
We mostly use clocks for our embedded code, which runs single process with little or no system overhead - meaning we can count exactly the number of clock cycles operations take. Personally I wouldn't recommend "rolling your own" profiling code (except at an extremely coarse scale) for two reasons:
- It's difficult to account for how your process gets scheduled and what other operations are running.
- Profilers often highlight hotspots you would never have considered to be hotspots without their intervention.
I've used AMD CodeAnalyst to great effect, but naturally it has to run on an AMD processor. It's more a case of "tells you more than you want to know" if you dig deep enough. http://developer.amd.com/cpu/codeanalyst/Pages/default.aspx
I occasionally use an application called Very Sleepy: http://www.codersnotes.com/sleepy
It's a simple, unassuming tool, and I don't know how well it suits your needs. It's done well enough for me, as a fairly straightforward sampling profiler. I am authoring a .NET profiler called SlimTune that will gain native support eventually -- but it's not in there now, and it could be some months before it's available.
Performance Validator (from Software Verification, the company I work for) seems to match what you are looking for:
- Sampling and non-sampling modes.
- C, C++, Delphi, Windows.