views:

74

answers:

3

Let arr be an array of dimension 16 x 20
Here is the valgrind output for the code snippet mentioned. The output is from cachegrind.

for (i = 0; i < 20; i++)
    arr[0][i] = 0;


Ir I1mr   I2mr    Dr  D1mr   D2mr    Dw   D1mw    D2mw  

64      0      0       41     0      0     1      0       0          
60      0      0       20     0      0    20      2       2                

I have read the what these individual parameters mean from valgrind documentation. But, I am not able to tally those with the above figures. Like for the for loop, do we really have 41 cache data reads? or for the array arr, how can we have 2 L2 write misses?

My configuration is L1d = L1I = 32KB, L2 = 2MB, 64 byte cache line size, and 8-way set associative.

A: 

Most of your data reads come from the loop variable i.

21 from the conditional i<20
20 reads from i++.
20 reads from i in the lvalue arr[0][i].

I'm not up to date on how cache works, but assuming 32 bit int array, your writes cover 10 cache lines. Wild guess: the last two lines are your write misses as it somehow doesn't predict your next write.

If you unroll the loop, you will see the counts collapse to small numbers.

arr[0][0]=0; 
arr[0][1]=0;
..    
Erik Olson
A: 

I think the data mentioned with the above text may be erroneous as it was picked from inside a large code, thus there were effects from other variables as well.

anup
I was able to reproduce your counts.
Erik Olson
A: 

As Erik Olson says, the 41 reads in the for line are all for i - 21 in the i < 20 test, and 20 in the i++ (if you compile with optimisation, these should reduce).

There are two L2 write misses because your 20 integers cover 80 bytes, which is (at best) two cache lines. Depending on the alignment of the array, it might cover 3 cache lines, which would cause three write misses.

caf