Has anyone had experience using prefetch instructions for the Core 2 Duo processor?
I've been using the (standard?) prefetch set (prefetchnta
, prefetcht1
, etc) with success for a series of P4 machines, but when running the code on a Core 2 Duo it seems that the prefetcht(i)
instructions do nothing, and that the prefetchnta
instruction is less effective.
My criteria for assessing performance is the timing results for a BLAS 1 vector-vector (axpy) operation, when the vector size is large enough for out-of-cache behaviour.
Have Intel introduced new prefetch instructions?