views:

1275

answers:

1

Several of the resources I've gone to on the internet have disagree on how set associative caching works.

For example hardware secrets seem to believe it works like this:

Then the main RAM memory is divided in the same number of blocks available in the memory cache. Keeping the 512 KB 4-way set associative example, the main RAM would be divided into 2,048 blocks, the same number of blocks available inside the memory cache. Each memory block is linked to a set of lines inside the cache, just like in the direct mapped cache.

http://www.hardwaresecrets.com/printpage/481/8

They seem to be saying that each cache block(4 cache lines) maps to a particular block of contiguous RAM. They are saying non-contiguous blocks of system memory(RAM) can't map to the same cache block.

This is there picture of how hardwaresecrets thinks it works http://www.hardwaresecrets.com/fullimage.php?image=7864

Contrast that with wikipedia's picture of set associative cache http://upload.wikimedia.org/wikipedia/commons/9/93/Cache%2Cassociative-fill-both.png.

Brown disagrees with hardware secrets

Consider what might happen if each cache line had two sets of fields: two valid bits, two dirty bits, two tag fields, and two data fields. One set of fields could cache data for one area of main memory, and the other for another area which happens to map to the same cache line.

http://www.spsu.edu/cs/faculty/bbrown/web_lectures/cache/

That is, non-contiguous blocks of system memory can map to the same cache block.

How are the relationships between non-contiguous blocks on system memory and cache blocks created. I read somewhere that these relationships are based on cache strides, but I can't find any information on cache strides other than that they exist.

Who is right? If striding is actually used, how does striding work and do I have the correct technical name? How do I find the stride for a particular system? is it based on the paging system? Can someone point me to a url that explains N-way set associative cache in great detail?

also see: http://www.cs.umd.edu/class/sum2003/cmsc311/Notes/Memory/set.html

+3  A: 

When I teach cache memory architecture to my students, I start with a direct-mapped cache. Once that is understood, you can think of N-way set associative caches as parallel blocks of direct-mapped cache. To understand that both figures may be correct, you need to first understand the purpose of set-assoc caches.

They are designed to work around the problem of 'aliasing' in a direct-mapped cache, where multiple memory locations can map to a specific cache entry. This is illustrated in the Wikipedia figure. So, instead of evicting a cache entry, we can use a N-way cache to store the other 'aliased' memory locations.

In effect, the hardware secrets diagram would be correct assuming the order of replacement is such that the first chunk of main memory is mapped to Way-1 and then the second chunk to Way-2 and so on so forth. However, it is equally possible to have the first chunk of main memory spread over multiple Ways.

Hope this explanation helps!

PS: Contiguous memory locations are only needed for a single cache line, exploiting spatial locality. As for the latter part of your question, I believe that you may be confusing several different concepts.

sybreon
How does it decide which chunks of system memory to map to which cache blocks? If I'm writing a program, how do I determine the system memory width that aligns with the cache?
e5
The cache memory decides it based on a replacement algorithm. It typically uses a form of pseudo-LRU. This is handled by hardware and it is out of your control. You can typically determine the cache memory parameters from the hardware data-sheet or user-guide. It is rarely a software concern unless you are doing embedded code that needs to run on a very tight time budget.
sybreon
I am not concerned with which blocks of system memory get evicted, but rather which blocks of system memory get mapped to which cache blocks. That is, what is the math an x86 chip uses to decide that system memory locations 1000,2000, and 3000 should map to cache block 5. I am writing software who's purpose is to manipulate the cache and thus it is purely concerned with cache mapping. I wrote a program to determine this mapping through experimentation, but I want an article or paper to double check my results.
e5
It's a combination of the index and the pseudo-LRU algorithm. The index is part of the memory address. The eviction algorithm is the one that decides which block to replace and hence, it also determines how it is mapped. Since your concern is x86 specific, I would recommend that you trawl through Intel documents. Look for information on their pseudo-LRU implementation.
sybreon
I read a few of intel's developer documents. I was unable to find an explanation in more depth than anymore else has provided. My understanding is that the eviction algorithm determines which cache line, in a cache block, to evict but not which cache block maps to which memory addresses. Is this an incorrect assumption?
e5
Yes. Since it decides which line in which block to evict, it also determines what replaces it. So, in essence, it is also deciding which block maps where.Eg. 1000, 2000 and 3000 all alias to a same index (00), and there are 2-sets (A,B). So, first reading 1000 may store it in A. Without the sets, reading 2000 will evict 1000. But with 2-set, 1000 can sit in A, and 2000 in B. Then, reading 3000, the LRU algorithm has to decide whether to evict 1000 or 2000, and replace that set A or B with 3000.
sybreon
Right, but what is deciding that the system memory blocks 1000, 2000 and 3000 alias to the same index(00). What is that logic, how is that decided?
e5
That's done using the index, which is typically the lower order bits of the memory address. So, the lower order bits for 1000, 2000, 3000 are 00. The number of lower order bits used depends on the cache configuration.
sybreon