tags:

views:

105

answers:

2

I'm trying to get GCC (or clang) to consistently use the SSE instruction for sqrt instead of the math library function for a computationally intensive scientific application. I've tried a variety of GCCs on various 32 and 64 bit OS X and Linux systems. I'm making sure to enable sse with -mfpmath=sse (and -march=core2 to satisfy GCCs requirement to use -mfpmath=sse on 32 bit). I'm also using -O3. Depending on the GCC or clang version, the generated assembly doesn't consistently use SSE's sqrtss. In some versions of GCC, all the sqrts use the instruction. In others, there is mixed usage of sqrtss and calling the math library function. Is there a way to give a hint or force the compiler to only use the SSE instruction?

+2  A: 

Use the sqrtss intrinsic __builtin_ia32_sqrtss?

MSN
A: 

You should be carefull in using that, you probably know that it has less precicision. That will be the reason that gcc doesn't use it systematically.

There is a trick that is even mentionned in INTEL's SSE manual (I hope that I remember correctly). The result of sqrtss is only one Heron iteration away from the target. Maybe that gcc is sometimes able to inline that surrounding brief iteration at some point (versions) and for others it doesn't.

You could use the builtin as MSN says, but you should definitively look up the specs on INTEL's web site to know what you are trading.

Jens Gustedt
I know that it's losing precision, however I'm hoping it's more consistent between different OS. The standard math library sqrt function isn't particularly standard, and the result is different results on different platforms. Speed and consistency are more important in this case.
arsenm
I'm not sure where you got this information, but it is incorrect. `sqrtss` is an IEEE-754 correctly rounded single-precision square root. Perhaps you are thinking instead of `rsqrtss`, which is a fast approximate reciprocal square root.
Stephen Canon
@arsenm: The standard math library sqrt functions are completely standardized, and the results are not allowed to vary between platforms that conform to Annex F of the C standard: "The sqrt functions in <math.h> provide the IEC 60559 square root operation." IEC 60559 (IEEE-754), in turn, fully specifies the semantics of square root.
Stephen Canon
@Stephen: ah, maybe I mixed them up. But then `sqrtss` must be relatively recent sse4 or so?
Jens Gustedt
@Jens Gustedt: Actually `sqrtss` is from the original SSE extension.
Stephen Canon
@Stephen: also `sqrtss` is for 32 bit floating points. So `sqrtf` should be the one to compare with. This is only included since C99 and not in c89, it seems.
Jens Gustedt
@arsenm: if you are willing to trade the lack of precision, or if you have `float` anyhow, try to use `sqrtf` instead of `sqrt` and see if this gets more reliably compiled into `sqrtss`.
Jens Gustedt
Actually I'm using SSE and SSE2 / using double, but the same problem exists, where SSE and libm function is mixed. I need precision more than float but less than double. sqrt may be standard under IEEE-754, but other standard math functions are not. I was having issues with inconsistent sqrt which made me think it might have been nonstandard, but it may just have somehow been related to this unfortunate mixing.Using the intrinsics is inconvenient. I've managed to get a scalar sqrt with them ends up being slightly slower than when gcc figures out how to use it itself, as well as being uglier.
arsenm