tags:

views:

1551

answers:

2

This is a somewhat low-level question. In x86 assembly there are two SSE instructions:

MOVDQA xmmi, m128

and

MOVNTDQA xmmi, m128

The IA-32 Software Developer's Manual says that the NT in MOVNTDQA stands for Non-Temporal, and that otherwise it's the same as MOVDQA.

My question is, what does Non-Temporal mean?

+10  A: 

Non-Temporal SSE instructions (MOVNTI, MOVNTQ, etc.), don't follow the normal cache-coherency rules. Therefore non-temporal stores must be followed by an SFENCE instruction in order for their results to be seen by other processors in a timely fashion.

When data is produced and not (immediately) consumed again, the fact that memory store operations read a full cache line first and then modify the cached data is detrimental to performance. This operation pushes data out of the caches which might be needed again in favor of data which will not be used soon. This is especially true for large data structures, like matrices, which are filled and then used later. Before the last element of the matrix is filled the sheer size evicts the first elements, making caching of the writes ineffective.

For this and similar situations, processors provide support for non-temporal write operations. Non-temporal in this context means the data will not be reused soon, so there is no reason to cache it. These non-temporal write operations do not read a cache line and then modify it; instead, the new content is directly written to memory.

Source: http://lwn.net/Articles/255364/

Espo
Nice answer, I would just like to point out that on the kind of processor with NT instructions, even with a non-non-temporal instruction (i.e. a normal instruction), the line cache is not "read and then modified". For a normal instruction writing to a line that is not in the cache, a line is reserved in the cache and a mask indicates what parts of the line are up-to-date. This webpage calls it "no stall on store": http://www.ptlsim.org/Documentation/html/node30.html . I couldn't find more precise references, I only heard about this from guys whose job is to implement processor simulators.
Pascal Cuoq
Actually http://www.ptlsim.org/ is a web site about a cycle-accurate processor simulator, exactly the same kind of thing the guys who told me about "no stall on store" are doing. I'd better mention them too in case they ever see this comment: http://unisim.org/
Pascal Cuoq
+2  A: 

Espo is pretty much bang on target. Just wanted to add my two cents:

The "non temporal" phrase means lacking temporal locality. Caches exploit two kinds of locality - spatial and temporal, and by using a non-temporal instruction you're signaling to the processor that you don't expect the data item be used in the near future.

I am a little skeptical about the hand-coded assembly that uses the cache control instructions. In my experience these things lead to more evil bugs than any effective performance increases.

Pramod