I'm running a benchmark on xeon server , and i repeat the executions 2-3 times. I'd like to erase the cache contents in L1 and L2 while repeating the runs. Can you suggest any methods for doing so ?
A:
Try to read repetitly large data via CPU (i.e. not by DMA). Like:
int main() {
const int size = 20*1024*1024; // Allocate 20M. Set much larger then L2
char *c = (char *)malloc(size);
for (int i = 0; i < 0xffff; i++)
for (int j = 0; j < size; j++)
c[j] = i*j;
}
However depend on server a bigger problem may be a disk cache (in memory) then L1/L2 cache. On Linux (for example) drop using:
sync
echo 3 > /proc/sys/vm/drop_caches
Edit: It is trivial to generate large program which do nothing:
#!/usr/bin/ruby
puts "main:"
200000.times { puts " nop" }
puts " xor rax, rax"
puts " ret"
Running a few times under different names (code produced not the script) should do the work
Maciej Piechotka
2010-08-09 19:26:08
Most modern CPUs have separate Instruction and Data caches; while cycling through 20M of RAM might clean the data cache; it wont touch the instruction cache. Additionally there's no guarantee the CPU will use all of it's cache, it might just reuse the same small section continuously.
Chris S
2010-08-09 20:03:17
Solution is basicly the same. Generate a lot of code and execute it.
Maciej Piechotka
2010-08-09 20:47:05
Newer processors are going to recognize the pattern and will not invalidate the existing cache line; so it will only use 2 (or so) lines of cache for your program. If cache is a big factor; better to just turn it off and not use it. On the other hand; it probably isn't making 2 hoots of difference in the first place.
Chris S
2010-08-09 23:06:25
I disagree with 'just turning it off'. Cache affects the optimization techniques to large extend and turning it off will affect the result. It is better to randomize technique (like random commands `nop`, `xor rax, rbx`, `add rax, rbx` etc.
Maciej Piechotka
2010-08-10 15:01:15