Hints and allegations abound that arithmetic with NaNs can be 'slow' in hardware FPUs. Specifically in the modern x64 FPU, e.g on a Nehalem i7, is that still true? Do FPU multiplies get churned out at the same speed regardless of the values of the operands?
I have some interpolation code that can wander off the edge of our defined data, and I'm trying to determine whether it's faster to check for NaNs (or some other sentinel value) here there and everywhere, or just at convenient points.
Yes, I will benchmark my particular case (it could be dominated by something else entirely, like memory bandwidth), but I was surprised not to see a concise summary somewhere to help with my intuition.
I'll be doing this from the CLR, if it makes a difference as to the flavor of NaNs generated.