views:

442

answers:

5

When writing simulations my buddy says he likes to try to write the program small enough to fit into cache. Does this have any real meaning? I understand that cache is faster than RAM and the main memory. Is it possible to specify that you want the program to run from cache or at least load the variables into cache? We are writing simulations so any performance/optimization gain is a huge benefit.

Thanks for your help. If I've made a duplicate or you know of any good links explaining CPU caching then point me in that direction.

+5  A: 

At least with a typical desktop CPU, you can't really specify much about cache usage directly. You can still try to write cache-friendly code though. On the code side, this often means unrolling loops (for just one obvious example) is rarely useful -- it expands the code, and a modern CPU typically minimizes the overhead of looping. You can generally do more on the data side, to improve locality of reference, protect against false sharing (e.g. two frequently-used pieces of data that will try to use the same part of the cache, while other parts remain unused).

Jerry Coffin
"a modern CPU typically minimizes the overhead of looping". Well, in a simple benchmark unrolling loops usually appears to give fantastic boosts. I've certainly seen unrolling even by 2 or 4 double code speed, on a modern CPU with compiler optimisation, provided it doesn't prevent the compiler doing any vectorization ops. This is because benchmark code always fits in cache. Then in real applications, all your unrolled loops add up, as do the cache misses. Basically, time taken to do X then Y does not equal time taken to do X plus time taken to do Y...
Steve Jessop
A: 

If I were you, I would make sure I know which parts of code are hotspots, which I define as

  • a tight loop not containing any function calls, because if it calls any function, then the PC will be spending most of its time in that function,
  • that accounts for a significant fraction of execution time (like >= 10%) which you can determine from a profiler. (I just sample the stack manually.)

If you have such a hotspot, then it should fit in the cache. I'm not sure how you tell it to do that, but I suspect it's automatic.

Mike Dunlavey
+5  A: 

Here's a link to a really good paper on caches/memory optimization by Christer Ericsson (of God of War I/II/III fame).

It's a couple of years old but it's still very relevant.

/A.B.

Andreas Brinck
+1  A: 

Most C/C++ compilers prefer to optimize for size rather than for "speed". That is, smaller code generally executes faster than unrolled code because of cache effects.

George V. Reilly
GCC has the optimization flags that will try to make fast code with the possible drawback of making the program bigger.
Casey
+2  A: 
Crashworks