tags:

views:

41

answers:

1

Background: I've implemented a stochastic algorithm that requires random ordering for best convergence. Doing so obviously destroys memory locality, however. I've found that by prefetching the next iteration's data, the performance drop is minimized.

I can prefetch n cache lines using _mm_prefetch in a simple, mostly OS+compiler-portable fashion - but what's the length of a cache line? Right now, I'm using a hardcoded value of 64, which seems to be the norm nowadays on x64 processors - but I don't know how to detect this at runtime, and a question about this last year found no simple solution.

I've seen GetLogicalProcessorInformation on windows but I'm leery of using such a complex API for something so simple, and that won't work on macs or linux anyhow.

Perhaps there's some entirely other API/intrinsic that could prefetch a memory region identified in terms of bytes (or words, or whatever) and allows me to prefetch without knowing the cache line length?

Basically, is there a reasonable alternative to _mm_prefetch with #define CACHE_LINE_LEN 64?

+2  A: 

There's a question asking just about the same thing here. You can read it from the CPUID if you feel like delving into some assembly. You'll have to write platform specific code for this of course.

You're probably already familiar with Agner Fog's manuals for optimization which gives the cache information for many popular processors. If you are able to determine the expected CPU's you'll encounter you can just hard-code the cache line sizes and look up the CPU vendor information to set the line size.

Ron Warholic
I wasn't familiar with it - thanks!
Eamon Nerbonne