views:

150

answers:

5

I'm writing realtime numeric software, in C++, currently compiling it with Visual-C++ 2008. Now using 'fast' floating point model (/fp:fast), various optimizations, most of them useful my case, but specifically:

a/b -> a*(1/b) Division by multiplicative inverse

is too numerically unstable for a-lot of my calculations.

(see: Microsoft Visual C++ Floating-Point Optimization)

Switching to /fp:precise makes my application run more than twice as slow. Is is possible to either fine-tune the optimizer (ie. disable this specific optimization), or somehow manually bypass it?

- Actual minimal-code example: -

void test(float a, float b, float c,
    float &ret0, float &ret1) {
  ret0 = b/a;
  ret1 = c/a;
} 

[my actual code is mostly matrix related algorithms]

Output: VC (cl, version 15, 0x86) is:

divss       xmm0,xmm1 
mulss       xmm2,xmm0 
mulss       xmm1,xmm0 

Having one div, instead of two is a big problem numerically, (xmm0, is preloaded with 1.0f from RAM), as depending on the values of xmm1,2 (which may be in different ranges) you might lose a lot of precision (Compiling without SSE, outputs similar stack-x87-FPU code).

Wrapping the function with

#pragma float_control( precise, on, push )
...
#pragma float_control(pop)

Does solve the accuracy problem, but firstly, it's only available on a function-level (global-scope), and second, it prevents inlining of the function, (ie, speed penalties are too high)

'precise' output is being cast to 'double' back and forth as-well:

 divsd       xmm1,xmm2 
 cvtsd2ss    xmm1,xmm1 
 divsd       xmm1,xmm0 
 cvtpd2ps    xmm0,xmm1 
+2  A: 

That document states that you can control the float-pointing optimisations on a line-by-line basis using pragmas.

Oli Charlesworth
A: 

Can you put the functions containing those calculations in a separate source code file and compile only that file with the different settings?

I don't know if that is safe though, you'll need to check !

John Burton
+3  A: 

Add the

#pragma float_control( precise, on)

before the computation and

#pragma float_control( precise,off)

after that. I think that should do it.

Gangadhar
yes, that's what I'm currently using:#pragma float_control( precise, on, push )...#pragma float_control(pop)unfortunately, it seems to work only on a whole-function level, plus prevent inlining. I'll add a comment with my current findings.It would really rock, if there'd be a more fine-tuned float-optimisation option
oyd11
A: 

(Weird) solution which I have found: whenever dividing by the same value in a function - add some epsilon:

    a/b; c/b 

->

    a/(b+esp1); c/(b+esp2)

Also saves you from the occasional div by zero

oyd11
A: 

There is also __assume. You can use __assume(a/b != (a*(1/b))). I've never actually used __assume, but in theory it exists exactly to fine-tune the optimizer.

DeadMG
Good to know about __assume(), other compilers I have used before had similar options ( NASSERT in TI's compilers), will try that, however, I think the compiler already assumes (a/b != (a*(1/b)))
oyd11