I've been reading Ulrich Drepper's, "What every programmer should know about memory" and in section 3.3.2 Measurements of Cache Effects ( halfway down the page ) it gives me the impression that accessing any member of a struct causes the whole struct to get pulled into the CPU cache.
Is this correct? If so, how does the hardware know about the layout of these structs? Or does the code generated by the compiler somehow force the entire struct to be loaded?
Or are the slowdowns from using larger structs primarily due to TLB misses caused by the structs being spread out across more memory pages?
The example struct used by Drepper is:
struct l {
struct l *n;
long int pad[NPAD];
};
Where sizeof(l)
is determined by NPAD
equals 0, 7, 15 or 31 resulting in structs that are 0, 56, 120, and 248 bytes apart and assuming cache lines that are 64 bytes and 4k pages.
Just iterating through the linked list gets significantly slower as the struct grows, even though nothing other than the pointer is actually being accessed.