Let arr be an array of dimension 16 x 20
Here is the valgrind output for the code snippet mentioned. The output is from cachegrind.
for (i = 0; i < 20; i++)
arr[0][i] = 0;
Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw
64 0 0 41 0 0 1 0 0
60 0 0 20 0 0 20 2 2
I have read the what these individual parameters mean from valgrind documentation. But, I am not able to tally those with the above figures. Like for the for loop, do we really have 41 cache data reads? or for the array arr, how can we have 2 L2 write misses?
My configuration is L1d = L1I = 32KB, L2 = 2MB, 64 byte cache line size, and 8-way set associative.