I am interested in mastering prefetch-related functions such as
_mm_prefetch(...)
so when I perform operations that loop over arrays, the memory bandwidth is fully utilized. What are the best resources for learning about this?
I am doing this work in C using GCC 4 series on an intel linux platform.