views:

31

answers:

2

CPU cache always interrupts what we test a performance of some codes.

gettime();
func1();
gettime();

gettime();
func2();
gettime();
// func2 is faster because of the cache.(or page faults of func1())
// But we often misunderstand.

When you measure your code performance, how do you remove the cache's influence.

I'm finding some functions or ways to do this in Windows.
Please give me your nice tips. Thanks.

A: 

Good code takes advantage of cache, so you can't just turn it off (you can, but these results will be completely irrelevant).

What do you need is to empty (or invalidate) the cache between successive tests. Here are some hints: http://stackoverflow.com/q/2213428/395626

ruslik
+1  A: 

One thing you can do is to call a function that has a lot of code and accesses a lot of memory in between calls to the item you are profiling. For example, in pseudo code (to be mostly language neutral):

// loop some number of times
{
  //start timing
  profile_func();
  //stop timing
  //add to total time
  large_func(); // Uses lots of memory and has lots of code
}
// Compute time of profile func by dividing number of iterations by total time

The code in the large_func() can be nonsense code, like some set of ops repeated over and over. The key is that it, or its code, does not get optimized out when you compile, so that it actually clears the code and data caches of the CPU (and, the L2 and L3 (if present) caches as well).

This is a very important test for many cases. The reason it is important is that small fast functions that are often profiled in isolation can run very fast, taking advantage of CPU cache, inlining and enregistration. But, often times, in large applications these advantages are absent, because of the context in which these fast functions are called.

As an example, just profiling a function by running it for a million iterations in a tight loop might show that the function executes in say 50 nanoseconds. Then you run it using the framework I showed above, and all of a sudden its running time can drastically increase to microseconds, because it can no longer take advantage of the fact that it has the entire processor - its registers and caches, to itself.

Michael Goldshteyn