views:

346

answers:

7

I need to profile a real time C++ app on Windows. Most of the available profilers are either terribly expensive, total overkill, or both. I don't need any .NET stuff. Since it is a real time app, I need the profiler to be as fast as possible. It would be excellent if it integrated in some way with Visual Studio 2005/2008, but that's not necessary. If this description reminds you of a profiler that you have used, I would really like to know about it. I am hoping to draw from people's use of C++ profilers on Windows to pinpoint one that will do the job. Thanks.

+3  A: 

When I have to profile realtime code, I think the only solution is something hand-rolled. You don't want too much coverage or you end up slowing the code down, but with a small data set, you need to be very focused, essentially picking each point by hand.

So I wrote a header file several years ago that defines some macros and a mechanism for capturing data, either as function timings or as a timeline (at time T in function X). The code uses QueryPerformanceCounter for the timings and writes the data into named shared memory via CreateFileMapping so that I can look at the timing data from another process live.

It takes a recompile to change what timing information I want to capture, but the code is so inexpensive that It has virtually no effect on the code.

All of the code is in the header file, (with macro guards so the code only gets included once). so the header file itself is my 'profiler'. I change some tables in the header, then and markup the target code, recompile and start profiling.

John Knoeller
Agreed. Also, if you don't need real-time analysis, just dump queryperformancecounter results into a history buffer, and at the end of the program output a csv file with values converted to milli/microseconds. Excel/OpenOffice is a suprisingly good tool for graphing results.
Justicle
Interesting. Thanks for this. It does seem to be the way I will be going, though as other answers point out, it does not help with identifying hot spots.
carleeto
+2  A: 

Give consideration to the no-profiler option.

It is common wisdom regarding performance problems that measuring is a prerequisite to finding them.
Not so. More on that subject.

This is an example of tuning an app for maximum performance.

In case you're worried about overhead and how to do this in a DSP real-time app, here's how I would do it. Get it running with realistic input, and just halt it in its tracks with the pause button. Capture the call stack into Notepad. Then start it up again and repeat several times. (Throw away any sample that is irrelevant, like waiting for user input or otherwise in idle state.) Notice that this process puts zero profiling overhead on the program.

If you have the problem that your code runs off a timer and is actually running for only a small fraction of the overall time, then a) you may decide that you don't actually have a problem, or b) you can still try to make it go faster. If (b) then wrap a loop around your code so that whatever it does, it repeats 10, 100, or 1000 times, making it take a large enough fraction of time so that samples will land in it. Use those samples to find out what to fix to make it faster. When you are done, remove the outer loop, and it will run like a bandit.

Mike Dunlavey
That's a sampling profiler - the OSX tool Shark is an excellent sampling profiler, sadly I don't know of anything that comes close for Windows. vTune is a profiler, but it's not great IMO (see http://blog.vlad1.com/2008/07/16/why-cant-vtune-be-more-like-shark/)
Adam Bowen
@Adam-Bowen: It's getting there. It looks like it captures stackshots on wall clock time. It's not clear that you can easily constrain it work during periods of subjective slowness. It's not clear if it gives you *line-level* percent of inclusive time. Then there's the issue that even the stackshots may not be sufficient state information - nothing compares to getting your fingers right in the code.
Mike Dunlavey
@Adam-Bowen: For those who really want to use a profiling tool, this is one that I think does almost everything right (although it's in Linux): http://www.rotateright.com Also there are the useful unix utilities **pstack** and **lsstack** - there may be something like that for OSX and Windows - I don't know.
Mike Dunlavey
Interesting, I've not seen that one before - might give it a go next time we're doing an optimisation cycle.
Adam Bowen
@Adam-Bowen: I've had a little discussion with them: http://rotateright.com/forum/index.php?topic=85.0
Mike Dunlavey
Thanks. Took me quite some time to read all the pages you linked to, but I will keep this in mind for the future. It seems to me that this method would compliment the hand rolled solution quite well - the hand rolled one would be good at optimizing sections of code while ignoring the rest. This one would be good for optimizing the application in general. I will definitely be giving this one a try.
carleeto
@carleeto: I hope I didn't lay too much material on you. The basic idea is pretty simple. Good luck.
Mike Dunlavey
A: 

Since it is a real time app, I need the profiler to be as fast as possible.

I don't know what you mean by real-time (hard, semi-hard, soft).

I once had to improve the performance of a fax server. The fax protocol is such that if either end delays too long (some tens or hundreds of milliseconds, depending) then the fax session is disconnected. I was therefore unable to use any commercial profiler that was available to me, because they slowed the execution of the server too much: and so instead I added various log messages (with time stamps) to instrument the code and thus find the bottle-necks.

ChrisW
I mean profile an audio streaming application with 1ms latency.
carleeto
+1  A: 

We do a fair amount of profiling, and have used Shark (OSX only), vTune, Glowcode and the old favourite of counters/clocks.

Of those Shark is by far and away the best (and free!), to the extent that I try to keep code portable to OSX so I can use it to profile. Unfortunately, it doesn't meet your requirements.

vTune was wholly unimpressive, it was too complicated to get a decent profile out of without being an expert in what all the profiling options, the front end GUI frequently crashed or just plain broke and its sampler doesn't sample the call stack making it almost useless for actually seeing how bottlenecks in your program are arising. It was also expensive (although we ended up buying a licence). In its favour it is cross platform, and you can get a 30-day trial to see if you like it.

Glowcode was decent, IIRC windows only and also offers a free trial. It's been a while since we used it but it might not be a bad place to start.

We mostly use clocks for our embedded code, which runs single process with little or no system overhead - meaning we can count exactly the number of clock cycles operations take. Personally I wouldn't recommend "rolling your own" profiling code (except at an extremely coarse scale) for two reasons:

  • It's difficult to account for how your process gets scheduled and what other operations are running.
  • Profilers often highlight hotspots you would never have considered to be hotspots without their intervention.
Adam Bowen
+1 for a good post, but regarding embedded code, I have this war story: http://stackoverflow.com/questions/890222/analyzing-code-for-efficiency/893272#893272 . Basically, there's an orthogonal view that measuring time is only a distraction from the process of finding code that needs optimizing.
Mike Dunlavey
I agree on the sampling, although I still prefer to use a profiler to do the hard work for me (IE the sampling/callstack resolution etc.). My major problem with vTune is that it doesn't sample enough information to work out what the problem is. When I've used timing it's usually to identify part of the algorithm that's performing slowly (and so is a candidate for looking for better complexity in the algorithm) rather than looking for lines of code to optimise.
Adam Bowen
Thanks. To date, my only fallback has been to profile on Mac OS X using Shark. I was looking at VTune and AMD's tools, but the fact that each works only with their processors kept me away. I have tried Glowcode, but it has crashed on me 6 times during the evaluation, so I'm not likely to pick it up. AQTime has been the closest I've come to getting a profiler that does what I need it to do, but I've had a couple of crashes with it too.
carleeto
A: 

I've used AMD CodeAnalyst to great effect, but naturally it has to run on an AMD processor. It's more a case of "tells you more than you want to know" if you dig deep enough. http://developer.amd.com/cpu/codeanalyst/Pages/default.aspx

Arthur Kalliokoski
+1  A: 

I occasionally use an application called Very Sleepy: http://www.codersnotes.com/sleepy

It's a simple, unassuming tool, and I don't know how well it suits your needs. It's done well enough for me, as a fairly straightforward sampling profiler. I am authoring a .NET profiler called SlimTune that will gain native support eventually -- but it's not in there now, and it could be some months before it's available.

Promit
I tried Very Sleepy. It just gives me errors about it not being able to launch the process, neither does it attach to the process..so I guess I'm out of luck there.
carleeto
+1  A: 

Performance Validator (from Software Verification, the company I work for) seems to match what you are looking for:

  • Sampling and non-sampling modes.
  • C, C++, Delphi, Windows.
Stephen Kellett