views:

414

answers:

6

The problem.

Microsoft Visual C++ 2005 compiler, 32bit windows xp sp3, amd 64 x2 cpu.

Code:

double a = 3015.0; 
double b = 0.00025298219406977296;
//*((unsigned __int64*)(&a)) == 0x40a78e0000000000  
//*((unsigned __int64*)(&b)) == 0x3f30945640000000  
double f = a/b;//3015/0.00025298219406977296;

the result of calculation (i.e. "f") is 11917835.000000000 (*((unsigned __int64*)(&f)) == 0x4166bb4160000000) although it should be 11917834.814763514 (i.e. *((unsigned __int64*)(&f)) == 0x4166bb415a128aef).
I.e. fractional part is lost.
Unfortunately, I need fractional part to be correct.

Questions:
1) Why does this happen?
2) How can I fix the problem?

Additional info:
0) The result is taken directly from "watch" window (it wasn't printed, and I didn't forget to set printing precision). I also provided hex dump of floating point variable, so I'm absolutely sure about calculation result.
1) The disassembly of f = a/b is:

fld         qword ptr [a]  
fdiv        qword ptr [b]  
fstp        qword ptr [f]  

2) f = 3015/0.00025298219406977296; yields correct result (f == 11917834.814763514 , *((unsigned __int64*)(&f)) == 0x4166bb415a128aef ), but it looks like in this case result is simply calculated during compile-time:

fld         qword ptr [__real@4166bb415a128aef (828EA0h)]  
fstp        qword ptr [f]  

So, how can I fix this problem?

P.S. I've found a temporary workaround (i need only fractional part of division, so I simply use f = fmod(a/b)/b at the moment), but I still would like to know how to fix this problem properly - double precision is supposed to be 16 decimal digits, so such calculation isn't supposed to cause problems.

+1  A: 

I'd guess you're printing out the number without specifying a precision. Try this:

#include <iostream>
#include <iomanip>

int main() { 
    double a = 3015.0; 
    double b = 0.00025298219406977296;
    double f = a/b;

    std::cout << std::fixed << std::setprecision(15) << f << std::endl;
    return 0;
}

This produces:

11917834.814763514000000

Which looks correct to me. I'm using VC++ 2008 instead of 2005, but I'd guess the difference is in your code, not the compiler.

Jerry Coffin
Nope, I'm not printing number, the result is taken directly from "watch" window.
SigTerm
Have you tried printing it ? Maybe the bug is in the watch window !!
Martin Beckett
@Martin The watch window shows the full precision.
jleedev
+1  A: 

UPDATE: Perhaps the watch-window is confused because of the cast to *_int64.

It works fine under VC++ 2008

alt text

Jive Dadson
To clarify, the casts are inserted by VC++ in the disassembly to illustrate what the floating-point load represents, and are not part of the code.
jleedev
I see that. I updated the comment. Thanks.
Jive Dadson
+3  A: 

Interestingly, if you declare both a and b as floats, you will get exactly 11917835.000000000. So my guess is that there is a conversion to single precision happening somewhere, either in how the constants are interpreted or later on in the calculations.

Either case is a bit surprising, though, considering how simple your code is. You are not using any exotic compiler directives, forcing single precision for all floating point numbers?

Edit: Have you actually confirmed that the compiled program generates an incorrect result? Otherwise, the most likely candidate for the (erroneous) single precision conversion would be the debugger.

Josef Grahn
There is no cast to single precision as the disassembly clearly shows.
Axel Gneiting
Not on those three lines, anyway.
Josef Grahn
+1  A: 

If you need precise math, don't use floating point.

Do yourself a favor and get a BigNum library with rational number support.

Frank Krueger
He doesn't need 11917834.814763514100059144562708, he just needs 11917834.814763514. Giving up orders of magnitude in performance and memory just to get the precision built into the machine seems a bit irrational (pardon the pun).
Gabe
Sure, we aren't entitled to expect exactitude, but We are still entitled to ask for that level of correctness that the floating-point specification promises us!
AakashM
No offense, but I think using bignums just for one calculation is a bit too much, at least in this case.
SigTerm
A: 

Are you sure you're examining the value of f right after the fstp instruction? If you've got optimizations turned on perhaps the watch window could be showing a value taken at some later point (this seems a bit plausible as you say you're looking at the fractional part of f later - does some instruction wind up masking it out somehow?)

Mike Dinsdale
+11  A: 

Are you using directx in your program anywhere as that causes the floating point unit to get switched to single precision mode unless you specifically tell it not to when you create the device and would cause exactly this

John Burton
This is a correct answer. Program uses Direct3D, and, of course, calculation happens after device creation. The funny thing is that I knew about D3D adjusting FPU precision, but I totally forgot about it, because I haven't seen this error in a last few years.Problem solved.
SigTerm
What flag should be used when creating the device? Does the same issue exist with Direct2D?
dalle