ansaurus

Question

Answer 1

+2 A:

L1 caches exist on these platforms. This will almost definitly remain true until memory and front side bus speeds exceed the speed of the CPU, which is a very likely a long way off.

On Windows, you can use the GetLogicalProcessorInformation to get some level of cache information (size, line size, associativity, etc.) The Ex version on Win7 will give even more data, like which cores share which cache. CpuZ also gives this information.

Michael 2009-04-04 00:11:34

Thanks for the suggestions. I was able to run CpuZ -- it told me that my L1 data cache was 32K Bytes (per core). Now I just need to figure out whether or not I trust that information.

nobar 2009-04-04 02:12:08

You can trust it.

Michael 2009-04-04 03:15:22

Can you explain why you are so confident in the accuracy of CpuZ? It's nice that such a tool exists but my confidence is shaken by the fact that I can't find strong corroborating data.

nobar 2009-04-04 03:38:11

I have seen data that indicates that the L2 cache runs at the CPU clock speed (2.5 GHz). To me this suggests that the front side bus speed is irrelevant to the question of L1 existence -- the L2 cache is faster than the FSB.

nobar 2009-04-04 03:42:28

This post also spurred me to find similar Linux based programs: cpuid and x86info. x86info gave me data for L1 that matched what CpuZ said. However, various inconsistencies and warnings by the two programs still left me doubting.

nobar 2009-04-04 23:23:59

Answer 2

+2 A:

Locality of Reference has a major impact on performance of some algorithms; The size and speed of L1, L2 (and on newer CPUs L3) cache obviously play a large part in this. Matrix multiplication is one such algorithm.

Mitch Wheat 2009-04-04 00:59:37

Answer 3

+10 A:

It is damned near impossible to find specs on Intel caches. When I was teaching a class on caches last year, I asked friends inside Intel (in the compiler group) and they couldn't find specs.

But wait!!! Jed, bless his soul, tells us that on Linux systems, you can squeeze lots of information out of the kernel:

grep . /sys/devices/system/cpu/cpu0/cache/index*/*

This will give you associativity, set size, and a bunch of other information (but not latency). For example, I learned that although AMD advertises their 128K L1 cache, my AMD machine has a split I and D cache of 64K each.

Two suggestions which are now mostly obsolete thanks to Jed:

AMD publishes a lot more information about its caches, so you can at least got some information about a modern cache. For example, last year's AMD L1 caches delivered two words per cycle (peak).
The open-source tool valgrind has all sorts of cache models inside it, and it is invaluable for profiling and understanding cache behavior. It comes with a very nice visualization tool kcachegrind which is part of the KDE SDK.

As basic data: in Q3 2008 a cache line was 64 bytes, L1 cache was 2-way associative and latency was 1/2 cycle, L2 cache was 16-way associative and latency was about 10 cycles. (All data is from AMD, but trusted colleagues tell me that Intel's designs are similar. Jed's technique shows an split I and D cache at L1, 8-way associative, 32K each.)

Norman Ramsey 2009-04-04 01:05:07

I've already started trying to use kcachegrind. As far as I have found so far, I have to tell the tool what my cache details are -- that's what led me to ask the question. You mentioned "cache models". Do you mean to say that valgrind might know the details that I'm looking for?

nobar 2009-04-04 03:48:12

Yes definitely---valgrind queries the CPUID, and if it recognizes your CPU, it uses a model for that CPU.

Norman Ramsey 2009-04-04 18:58:35

Like some of the other tools that I have run on Linux (cpuid and x86info), valgrind seems to be confused about my machine's cache configuration. Maybe this is just a matter of not recognizing my CPU or maybe it is an indication that the information being withheld by Intel.

nobar 2009-04-04 23:06:12

Intel L1 is 8-way associative. On Linux, you can pull all the numbers from `/sys/devices/system/cpu/cpu*/index*/cache`. Also, systems with glibc usually have `getconf(1)`, use like `getconf LEVEL1_DCACHE_ASSOC`.

Jed 2010-02-01 12:24:53

@Jed: you rock. Answer updated.

Norman Ramsey 2010-02-02 01:50:34

@Jed: Thanks for posting those great suggestions! @Norman: Neat trick with grep -- thanks for updating your post! @getconf: Where've you been all my life? :-)

nobar 2010-02-17 01:34:58

Answer 4

+5 A:

You are looking at the consumer specifications, not the developer specifications. Here is the documentation you want. The cache sizes vary by processor family sub-models, so they typically are not in the IA-32 development manuals, but you can easily look them up on NewEgg and such.

Edit: More specifically: Chapter 10 of Volume 3A (Systems Programming Guide), Chapter 7 of the Optimization Reference Manual, and potentially something in the TLB page-caching manual, although I would assume that one is further out from the L1 than you care about.

Not Sure 2009-04-04 01:06:21

I couldn't find real cache data in these manuals. Can you cite volume and page number?

Norman Ramsey 2009-04-04 01:08:26

I'm not really sure what you mean by "real", but chapter 7 of the Optimization manual is one place that goes into some detail. There's also the entire manual on the TLB and page caching. It would help to know what *exactly* you're looking for.

Not Sure 2009-04-04 01:14:48

There's also Chapter 10 of Volume 3A, the Systems programming guide.

Not Sure 2009-04-04 01:16:08

I found Table 10-1 of Volume 3A. It doesn't list individual processors but it does give details (or at least numerical ranges) for cache information for various processor families. It is still a little bit ambiguous (Core 2 Quad isn't explicitly listed for L1), but it's something. Thanks!

nobar 2009-04-04 03:17:19

Like most other resources, newegg doesn't list my Q9300 as having an L1 cache (I also didn't find it clearly indicated in the Intel documentation that you cited). I'm guessing that the L1 cache doesn't exist on that chip -- but I'm still just guessing.

nobar 2009-04-04 03:24:00

My hope was that doing a google search with [q9300 "L1 cache" site:intel.com] would reveal the processor-specific data from "developer specifications". No such luck.

nobar 2009-04-04 23:33:26

Your Q9300 definitely does have an L1 cache - just google "intel q9300 l1 cache" and you'll get specs from the hardware review sites. I can't think of any decent modern processor without an L1 and L2 cache - use CPU-z or somesuch to find out what it is on your machine.

Not Sure 2009-04-06 17:25:21

Answer 5

+2 A:

I did some more investigating. There is a group at ETH Zurich who built a memory-performance evaluation tool which might be able to get information about the size at least (and maybe also associativity) of L1 and L2 caches. The program works by trying different read patterns experimentally and measuring the resulting throughput. A simplified version was used for the popular textbook by Bryant and O'Hallaron.

Norman Ramsey 2009-04-04 19:03:28

I tried these out (and I had written a similar program). The results suggest discontinuous performance results at 32K and 3M on my Q9300. Thanks for the help!

nobar 2009-04-04 22:32:07

ansaurus

tags:

views:

answers:

L1 memory cache on Intel x86 processors

related questions