views:

111

answers:

4

I understand that the processor brings data into the cache via cache lines, which - for instance, on my atom processor - bring in about 64 bytes at a time, whatever the size of the actual data being read.

My question is :

Imagine that you need to read one byte from memory, which 64 bytes will be brought into the cache ?

The two possibilities I can see is that, either the 64 bytes start at the closest 64 bytes boundary below the byte of interest, or the 64 bytes are spread around the byte in some predetermined way (for instance, half under, half above, or all above).

Which is it ?

+1  A: 

Processors may have multi-level caches (L1, L2, L3), and these differ on size and speed.

Yet, to understand what exactly goes into each cache you'll have to study the branch predictor used by that specific processor, and how the instructions/data of your program behave against it.

Read about branch predictor, CPU cache and replacement policies.

This is not an easy task. If at the end of the day all you want is a performance test, you can use a tool like Cachegrind. However, as this is a simulation, its result may differ at some degree.

jweyrich
+1  A: 

I can't say for certain as every hardware is different, but it is typically "64 bytes start at the closest 64 bytes boundary below" as that is a very fast and simple operation for the CPU.

bramp
+1  A: 

Read this: What every programmer should know about memory. Then read it again. Better (pdf) source here.

andersoj
+2  A: 

If that cache line is not already present in the cache, your CPU will request the 64 bytes that begin at the cache line boundary (the largest address below the one you need that is multiple of 64). It will start by bringing in the 8 bytes around the byte you need, again aligned to the multiple of 8. This for a simple reason that all modern PC memory modules have 64-bit (8 byte) bandwidth and therefore operate 8 bytes at a time. As a rule of thumb, if the processor can't forecast that memory access, the retrieval process can take 10 to 30 nanoseconds. Then, while you're working with the byte, the processor will pull the remaining 56 bytes into the cache.