views:

350

answers:

2

Hello,

I want to read a memory location without polluting the cache. I am working on X86 Linux machine. I tried using MOVNTDQA assembler instruction:

  asm("movntdqa %[source], %[dest] \n\t"
      : [dest] "=x" (my_var) : [source] "m" (my_mem[0]) : "memory");

my_mem is an int* allocated with new, my_var is an int.

I have two problems with this approach:

  1. The code compiles but I am getting "Illegal Instruction" error when running it. Any ideas why?
  2. I am not sure what type of memory is allocated with new. I would assume that WB. According to documentation, the MOVNTDQA instruction will work only will USWC memory type. How can I know what memory type I am working on?

To summarize, my question is:

How can I read a memory location without polluting the cache on an X86 machine? Is my approach in the right direction, and can it be fixed to work?

Thanks.

+1  A: 

MOVNTDQA is only available with SSE.

Why are you trying to avoid using the cache? CPUs are generally pretty good at deciding what to kick out of the cache when. If do genuinely need to, one way would be to arrange for an alias of the memory area you are reading from to be mapped into your address space with caching disabled and reading from there.

If what you are trying to achieve is actually to minimise your code's impact on another function's working set being held in cache at the time, this should be doable by issuing appropriate prefetch and invalidate instructions.

moonshadow
I have two cores on one process - one of them is heavily using the cache, and the other one has a lower priority, and therefore I'm trying to decrease its cache usage.On this machine, L2 cache is mutual for the two cores - so what I would like to do is for the memory to be directly loaded to L1, or to the registers (in the lower priority program).Could you please elaborate on how using prefetch and invalidate instructions can help me in this case?Many thanks.
Anna
My thought was to arrange for the low-priority process to explicitly kick out its cache lines as soon as it was done with them, thus allowing them to be refilled sooner than the CPU's cache management policy might otherwise permit, and perhaps arranging for the high priority process to issue prefetches in particularly expensive sections. Not sure how much such an approach would help in your scenario, though.
moonshadow
@moonshadow: Processors are not always good at deciding what to cache. Thats the explicit reason why there is a movntdqa instruction. Its for streaming data, once used, never touched again (at least not too soon ;-)).As for a helper thread, this might be viable, if there is another thread available, which can't be used fully otherwise - hyper threading comes to mind. But in most cases you will get better result with two threads doing full work and using explicit prefetch instructions.
drhirsch
+5  A: 

The problem with the movntdqa instruction with %%xmm as target (loading from memory) is, that this insn is only available with SSE4.1 and on. This means newer Core 2 (45 nm) or i7 only so far. The other way around (storing data to memory) is available in earlier SSE versions.

For this instruction, the processor moves the data into one very small of very few read buffers (Intel doesn't specify the exact size, but assume it is in the range of 16 bytes), where is is readyly available, but gets kicked out after a few other loads.

And it does not pullote the other caches, so if you have streaming data, your approach is viable.

Remember, you need to use a sfence insn afterwards.

Prefetching exists in two variants: prefetcht0 (Prefetches data in all caches) and prefetchnt (Prefetch non temporal data). Usually prefetch in all caches is the right thing to do, for a streaming data loop the latter would be better, if you make consequent use of the streaming instructions.

You use it with the address of an object you want to use in the near future, usually some iterations ahead if you have a loop. The prefetch insn doesn't wait or block, it just makes the processor start getting the data at the specified memory location.

drhirsch