C++ code performance

views:

500

answers:

+1 Q:

C++ code performance

Hi there

When is about writing code into C++ using VS2005, how can you measure the performance of your code?

Is any default tool in VS for that? Can I know which function or class slow down my application?

Are other external tools which can be integrated into VS in order to measure the gaps in my code?

We use Rational quantify which comes as a part of Rational PurifyPlus set of tools.

Its an excellent tool for profiling application performance.

Canopus 2009-05-25 14:29:56

+1 A:

You can always measure the time and performance of you code yourself. Consult MSDN about the the following functions QueryPerformanceCounter() and QueryPerformanceFrequency().

For more in depth analysis of memory allocation and execution times we use Memory Validator and Performance Validator from Software Verify. They have support for several languages other than C++.

Magnus Skog 2009-05-25 14:32:23

+4 A:

If you have the Team System edition of Visual Studio 2005, you can use the built-in profiler.

http://msdn.microsoft.com/en-gb/library/z9z62c29(VS.80).aspx

JaredPar 2009-05-25 14:37:19

+2 A:

You want a tool called a profiler. For a free one that covers most simple cases, I recommend Very Sleepy. It works by sampling the application's current call stack at regular intervals.

fbonnet 2009-05-25 14:37:20

+3 A:

You could also use Intel VTune.

Simon H. 2009-05-25 14:37:26

+1 A:

I think measuring performance, and locating code to optimize, are different problems, and require different methods.

To locate code to optimize, I swear by this simple method, which is orthogonal to accepted wisdom about profiling, and does not require you to buy or install any tools.

To measure performance, I'm content with the simple process of running the subject code in a loop and timing it.

EDIT: BTW, I just looked at Very Sleepy, and it appears to be on the right track. It samples the entire call stack, and retains each stack. What I can't tell is if it gives you, for each call instruction or regular instruction, the fraction of stack samples containing that instruction. In my opinion, that is the most valuable statistic, and it does not need to be very precise.

dotTrace, on the other hand, also looks like maybe it retains stack samples, but its UI presentation of call-stack info seems to be a call-tree. What I would look for is something that shows the stack-residence percentage of individual instructions (or statements), because they could be in different branches of the call-tree, and thus the call-tree could miss their importance.

Mike Dunlavey 2009-05-25 18:14:19

+4 A:

AMD CodeAnalyst is available for free for both Windows and Linux and works on most x86 or x64 CPUs (including Intel's).

It has extra features available when you have an AMD processor, of course. It also integrates into Visual Studio.

I've had pretty good luck with it.

Note that there are generally at least two common forms of profiler:

instrumenting: alters your build to record information at the beginning and end of certain areas (usually per function)
sampling: periodically looks at what code is running to record information

The types of information recorded can include (but are not limited to): elapsed time, # of CPU cycles, cache hits/misses, etc.

Instrumenting can be specific to certain areas of the code (just certain files or just code you compile, not libraries you link to). The overhead is much higher (you're adding code to the project, which takes time to execute, so you're altering timing; you may change program behavior for e.g. interrupt handlers or other timing-dependent code). You're guaranteed that you will get information about the functions/areas you instrument, though.

Sampling can miss very small or very sporadic functions, but modern machines have hardware help to allow you to sample much more thoroughly. Note that some sampling systems may still inject timing differences, although they generally will be much much smaller.

Some profiling tools support a mixture of the above, depending on how you use them.

leander 2009-05-25 18:31:24

If someone knows a better name than "function cost", I'd be happy to update this. "Intrusive", maybe? I've heard this referred to by several names.

leander 2009-05-25 18:38:16

The term you're looking for is "instrumenting". I don't think your statement about it being less accurate is correct though - in my experience they tend to be very accurate, because they're careful to only time my code. Sampling profilers are much less accurate because they can easily miss pieces of code which run faster than their sample rate.

Peter 2009-05-25 21:58:49

I think these terms are artifacts of current profiling technology. I would define the cost of a function as the percent of time it is on the call stack, because if it took zero time, that's what would be saved. I would define the cost of a statement the same way. The fear about sampling missing stuff is unfounded, because no matter how fast it runs, if it costs 10% (say) that's the percent of stack samples that will catch it, on average. Instrumentation mystifies me, because it doesn't capture this cost. And, slowing down the program is no problem, as long as it is unbiased.

Mike Dunlavey 2009-05-26 00:10:09

@Peter: thanks, updated to use "instrumenting". The "accuracy" I was speaking of is mostly related to differing program behavior when it runs slower -- in e.g. embedded framerate-driven development, especially when using interrupts. I've adjusted for that above.

leander 2009-05-26 18:18:46

@Mike Dunlavey: in many cases (embedded hardware, for instance), there's no support for sampling at all. Also, in some cases, sampling may still affect program behavior slightly; for example, on one debugging kit we use, there's only a fixed-size buffer for sample data; as it fills up it needs to be streamed back to a host PC. Filling it can block execution for the whole device. To avoid this you *can* set your sample rate much lower (4Hz or less), at which point you start to lose information about functions that lie in the 1% range...

leander 2009-05-26 18:23:16

@Mike Dunlavey: ...at some point the comparatively "fixed" overhead of certain instrumentation methods looks like a win vs. the less controllable overhead of some sampling methods. (We've actually implemented "software sampling" via interrupt service routine and timer before -- generally not pretty.) All that having been said, I'd love to have both options available to me generally. With the platforms we're working on at the moment we seem to get stuck with one or the other (no compiler support for thorough instrumentation here, no hardware support for sampling there).

leander 2009-05-26 18:25:30

Mike Dunlavey 2009-05-26 21:27:49

... we could discuss the fine points all day. You won't see why it works until you see it work, because it's too much of a mental frame-shift.

Mike Dunlavey 2009-05-26 21:40:36

@Mike: no, I understand where you're coming from; I've done what you're describing. It's the "completely unbiased" halting-your-entire-program that is the issue here -- in multiprocessor systems (especially with networking and/or RTC), it's damnably hard. On the system I'm working with, it's only really easy to halt the main processor -- secondary processor, RTC, DMA engine, and networking keep on going... I'm only saying that in this case, I occasionally prefer the predictable bias of instrumentation over what is relatively unpredictable if you're just halting 1 of 4+ processors.

leander 2009-05-27 16:07:07

@Mike: (To be fair, I've only worked with a _very_ limited subset of these embedded systems, so it may be much easier to get a unform halt across multiple ICs some places. Also, you can design your code to be much less prone to different behavior, or at least much more predictable, if portions of the system keep running; ours unfortunately is not. There's no real use sampling if it's going to end up throwing your program into timeout branches, etc.)

leander 2009-05-27 16:14:35

@leander: I've also had to do this in highly asynchronous situations. It is not easy, but here's what I did: First, I applied the stack-sampling method to each "thread" and made it as efficient, by itself, as possible. Second, I arranged to capture a time-based log of all inter-process events. I plotted them out on a long timeline and, event-by-event, determined the purpose of each one. I was looking for messages of dubious necessity, and I was looking for delays between receipt of a message and handling of it. This took lots of coffee, and maybe there's a better way, but it really worked.

Mike Dunlavey 2009-05-27 19:24:27

... What's common about these is asking the question "What the heck is it waiting for in _this_ nanosecond?" The why-chain that answers that question can stretch across multiple threads or processors, so it's not easy to follow, but it does exist, and I think that's the key information because you're trying to eliminate unnecessary waiting.

Mike Dunlavey 2009-05-27 19:33:29

... A short story if you don't mind. I know a pharmacologist who had to time the second-by-second disposition of a drug in rats. They used the interrupt technique. 1) Inject the drug. 2) Wait n seconds. 3) Flash-freeze and decapitate the rat. 4) Measure the drug in the various tissues. So just stopping a process doesn't bother me :-)

Mike Dunlavey 2009-05-27 19:44:42

@Mike: fair enough. =) We still _primarily_ use the "hit stop in the debugger to halt, check stack, repeat a few times" process of profiling; the more advanced stuff only comes out in rare situations. So I think we're mostly in agreement. =)

leander 2009-05-27 21:22:53

@leander: Nice to hear. I'm just trying to spread the word. The profiler makers are getting closer to it. When they finally catch on, their tools could be quite a bit more useful, IMHO.

Mike Dunlavey 2009-05-28 02:47:34

I've recently tried JetBrains dotTrace profiler and it looks very good. It helped me locate a number "black holes" in existing C++ code quite easily.

It works fine in Visual Studio 2005 Professional in a solution which mixes C# and C++ - it uses the right function names for both pieces of code and does an integrated analysis. You can trace for time or memory.

It will be a pity when the evaluation period expires :)

Daniel Daranas 2009-05-25 18:38:30

+1 A:

For intrusive measurement, use the performance counters. Since you're using C++, you should use a facade over this slightly painful API. STLSoft has a family of such things, with different pros and cons. I suggest winstl::performance_counter for highest resolution, or winstl::threadtimes_counter if you want to monitor the performance of a particular thread regardless of other activity in your process(es). There was an article about this in Dr Dobb's several years ago, in which the design rationale behind the facades was described in detail.

For non-intrusive measurement, you can't go past VTune.

dcw 2009-05-25 21:27:13

We've had good results from AQTime. It's not free but is cheaper than Visual Studio ;-)

Peter 2009-05-25 21:56:33

ansaurus

tags:

views:

answers:

C++ code performance

related questions