views:

1204

answers:

11

How can I count operations in C++? I'd like to analyze code in a better way than just timing it since the time is often rounded to 0 millisec.

+6  A: 

If you are timing code, it's worth running it a lot of times in a loop to avoid the effect of the timer resolution. So you might run the thing you're timing 10,000 times and measure the amount of time it takes to run all the iterations. It will probably only take a few seconds to run and you'll get better timing data.

Greg Hewgill
The only difficulty with this can be that the benchmark thoroughly warms the 'caches', whereas the code normally runs on cold caches. But measuring cold cache performance is hard.
Jonathan Leffler
Yes, unless you're measuring the performance of the cache itself, you'll need to disable caches for proper performance testing.
Greg Hewgill
Doesn't disabling your CPU caches give misleading results in the other direction? Instead of not knowing when you'd lose performance to cache misses, now you don't know when you'd gain performance by cache hits.
Steve Jessop
Oh, I wasn't suggesting disabling the CPU cache. I was thinking of things like application-level caches.
Greg Hewgill
+5  A: 

Using "the number of operations" is a bad idea when thinking about performance. It doesn't take into account the differences between the best case/worst case cycle counts for each operation, the costs of cache misses, pipeline misses, potential (automatic) parallelisation etc.

As Greg says, usually it's a better idea for a microbenchmark to just run the same code enough times to get a decent span of time.

Even better is to run your whole application with a realistic workload and measure the metrics you're really interested in, but that's a different matter...

What is certainly useful is to work out the complexity of your code - know when a method is going to be O(1), O(log n), O(n) etc. That typically doesn't involve knowing the details of what the individual instructions in C++ do - although you do need to know the complexity of anything you call. (Joel's story of Shlemiel the Painter and strlen being the most obvious example.)

Jon Skeet
A: 

Use a higher-resolution timer.

Jim Buck
+4  A: 

A sample profiler is a good choice here. On Windows, you can use the profiler built into Visual Studio, or the xperf tools from the Windows organization. The xperf tools are free. Here is series of posts on the xperf tools from myself. This one is about profiling.

Foredecker
+2  A: 

Generate the assembly and count operations. Then review the cycles/op that your processor uses. Then remember you're working on a pre-emptive OS and none of that is valid.

More seriously, jack up your n and scale your program to obscene sizes. That will give you an idea of what your program speed is.

Paul Nathan
+2  A: 

If you want an actual operation counts coming from your hardware, then you may want to consider installing a package like PAPI - Performance API - which works across many different OS and processor combinations. It uses actual hardware counters and reports either direct or derived values for a lot of different performance metrics such as Total Ops, FLOPS, cache hits/misses, etc. It can also give access to higher resolution timers.

It's not the easiest package ever, but the level of reporting can really help you analyze the behavior of your application on your hardware.

jvasak
+2  A: 

Use valgrind on Linux. It has instruction level timing, including cache analysis.

keraba
+2  A: 

You can do precise measurements by reading the time-stamp-counter (tsc) of the CPU, which is incremented by one at each cpu-clock.

Unfortunately the read is done inlining some assembler instructions in the code. Depending on the underlying architecture the cost of the read varies between ~11(AMD) and ~33(Intel) tsc. With 1 Ghz CPU you can virtually have the nano-second precision.

In order to perform a reliable and non-invasive measure of a section of code you can:

  • prevent the cpu scaling frequency by disabling the cpu features such as AMD cool'n quite or Intel SpeedStep.
  • repeat the test several times, collecting the measures in an array and then saving data to file for an off-line analysis.
  • choose a real-time scheduling policy for the process under test such as SHED_RR or SHED_FIFO. Realtime policies reduce the number of context-switch between the process under test and other normal processes/kernel threads, that are blocked.
  • lock all the process's virtual address space in RAM by means of mlockall() system call.

Here you can find a quasi-portable C++ class I wrote for Linux, derived from the Linux kernel and designed to read tsc for the architectures i386, x86_64 and ia64.

Nicola Bonelli
The TSC can sometimes be unreliable, especially in multicore systems. See http://en.wikipedia.org/wiki/Time_Stamp_Counter for more details.
Adam Rosenfield
A: 

Why do you not just run your code under a profiler? That typically gives you data about how much time is spent in functions as well as how many times they are called.

Knowing how many times a function is called is useful because it can allow you to spot potential performance problems if a function is being called far more often than you feel it should be.

Of course using a profiler makes your code slower but that is unavoidable when adding any kind of instrumentation.

MarkR
A: 

If you want precise timing (on windows) without using a profiler you can have a look at this thread which presents different ways of profiling C++ code.

Motti
A: 

If you're concerned about making your program go as fast as possible, and it is single-thread, and you're using an IDE, check this out: How to Optimize Your Program's Performance

Mike Dunlavey