I'm writing realtime numeric software, in C++, currently compiling it with Visual-C++ 2008.
Now using 'fast' floating point model (/fp:fast
), various optimizations, most of them useful my case, but specifically:
a/b -> a*(1/b) Division by multiplicative inverse
is too numerically unstable for a-lot of my calculations.
(see: Microsoft Visual C++ Floating-Point Optimization)
Switching to /fp:precise
makes my application run more than twice as slow. Is is possible to either fine-tune the optimizer (ie. disable this specific optimization), or somehow manually bypass it?
- Actual minimal-code example: -
void test(float a, float b, float c,
float &ret0, float &ret1) {
ret0 = b/a;
ret1 = c/a;
}
[my actual code is mostly matrix related algorithms]
Output: VC (cl, version 15, 0x86) is:
divss xmm0,xmm1
mulss xmm2,xmm0
mulss xmm1,xmm0
Having one div, instead of two is a big problem numerically, (xmm0, is preloaded with 1.0f from RAM), as depending on the values of xmm1,2 (which may be in different ranges) you might lose a lot of precision (Compiling without SSE, outputs similar stack-x87-FPU code).
Wrapping the function with
#pragma float_control( precise, on, push )
...
#pragma float_control(pop)
Does solve the accuracy problem, but firstly, it's only available on a function-level (global-scope), and second, it prevents inlining of the function, (ie, speed penalties are too high)
'precise' output is being cast to 'double' back and forth as-well:
divsd xmm1,xmm2
cvtsd2ss xmm1,xmm1
divsd xmm1,xmm0
cvtpd2ps xmm0,xmm1