I can only assume this is a bug. The first assert passes while the second fails:
double sum_1 = 4.0 + 6.3;
assert(sum_1 == 4.0 + 6.3);
double t1 = 4.0, t2 = 6.3;
double sum_2 = t1 + t2;
assert(sum_2 == t1 + t2);
If not a bug, why?
I can only assume this is a bug. The first assert passes while the second fails:
double sum_1 = 4.0 + 6.3;
assert(sum_1 == 4.0 + 6.3);
double t1 = 4.0, t2 = 6.3;
double sum_2 = t1 + t2;
assert(sum_2 == t1 + t2);
If not a bug, why?
You are comparing floating point numbers. Don't do that, floating point numbers have inherent precision error in some circumstances. Instead, take the absolute value of the difference of the two values and assert that the value is less than some small number (epsilon).
void CompareFloats( double d1, double d2, double epsilon )
{
assert( abs( d1 - d2 ) < epsilon );
}
This has nothing to do with the compiler and everything to do with the way floating point numbers are implemented. here is the IEEE spec:
http://www.eecs.berkeley.edu/~wkahan/ieee754status/IEEE754.PDF
Comparisons of double precision numbers are inherently inaccurate. For instance, you can often find 0.0 == 0.0
returning false. This is due to the way the FPU stores and tracks numbers.
Testing for equality is problematic. Two computational sequences that are mathematically equal may well produce different floating-point values.
You will need to use a delta to give a tolerance for your comparisons, rather than an exact value.
It may be that in one of the cases, you end up comparing a 64-bit double to an 80-bit internal register. It may be enlightening to look at the assembly instructions GCC emits for the two cases...
This is something that has bitten me, too.
Yes, floating point numbers should never be compared for equality because of rounding error, and you probably knew that.
But in this case, you're computing t1+t2
, then computing it again. Surely that has to produce an identical result?
Here's what's probably going on. I'll bet you're running this on an x86 CPU, correct? The x86 FPU uses 80 bits for its internal registers, but values in memory are stored as 64-bit doubles.
So t1+t2
is first computed with 80 bits of precision, then -- I presume -- stored out to memory in sum_2
with 64 bits of precision -- and some rounding occurs. For the assert, it's loaded back into a floating point register, and t1+t2
is computed again, again with 80 bits of precision. So now you're comparing sum_2
, which was previously rounded to a 64-bit floating point value, with t1+t2
, which was computed with higher precision (80 bits) -- and that's why the values aren't exactly identical.
Edit So why does the first test pass? In this case, the compiler probably evaluates 4.0+6.3
at compile time and stores it as a 64-bit quantity -- both for the assignment and for the assert. So identical values are being compared, and the assert passes.
Second Edit Here's the assembly code generated for the second part of the code (gcc, x86), with comments -- pretty much follows the scenario outlined above:
// t1 = 4.0
fldl LC3
fstpl -16(%ebp)
// t2 = 6.3
fldl LC4
fstpl -24(%ebp)
// sum_2 = t1+t2
fldl -16(%ebp)
faddl -24(%ebp)
fstpl -32(%ebp)
// Compute t1+t2 again
fldl -16(%ebp)
faddl -24(%ebp)
// Load sum_2 from memory and compare
fldl -32(%ebp)
fxch %st(1)
fucompp
Interesting side note: This was compiled without optimization. When it's compiled with -O3
, the compiler optimizes all of the code away.
I've duplicated your problem on my Intel Core 2 Duo, and I looked at the assembly code. Here's what's happening: when your compiler evaluates t1 + t2
, it does
load t1 into an 80-bit register
load t2 into an 80-bit register
compute the 80-bit sum
When it stores into sum_2
it does
round the 80-bit sum to a 64-bit number and store it
Then the ==
comparison compares the 80-bit sum to a 64-bit sum, and they're different, primarily because the fractional part 0.3 cannot be represented exactly using a binary floating-point number, so you are comparing a 'repeating decimal' (actually repeating binary) that has been truncated to two different lengths.
What's really irritating is that if you compiler with gcc -O1
or gcc -O2
, gcc does the wrong arithmetic at compile time, and the problem goes away. Maybe this is OK according to the standard, but it's just one more reason that gcc is not my favorite compiler.
P.S. When I say that ==
compares an 80-bit sum with a 64-bit sum, of course I really mean it compares the extended version of the 64-bit sum. You might do well to think
sum_2 == t1 + t2
resolves to
extend(sum_2) == extend(t1) + extend(t2)
and
sum_2 = t1 + t2
resolves to
sum_2 = round(extend(t1) + extend(t2))
Welcome to the wonderful world of floating point!
When comparing floating point numbers for closeness you usually want to measure their relative difference, which is defined as
if (abs(x) != 0 || abs(y) != 0)
rel_diff (x, y) = abs((x - y) / max(abs(x),abs(y))
else
rel_diff(x,y) = max(abs(x),abs(y))
For example,
rel_diff(1.12345, 1.12367) = 0.000195787019
rel_diff(112345.0, 112367.0) = 0.000195787019
rel_diff(112345E100, 112367E100) = 0.000195787019
The idea is to measure the number of leading significant digits the numbers have in common; if you take the -log10 of 0.000195787019 you get 3.70821611, which is about the number of leading base 10 digits all the examples have in common.
If you need to determine if two floating point numbers are equal you should do something like
if (rel_diff(x,y) < error_factor * machine_epsilon()) then
print "equal\n";
where machine epsilon is the smallest number that can be held in the mantissa of the floating point hardware being used. Most computer languages have a function call to get this value. error_factor should be based on the number of significant digits you think will be consumed by rounding errors (and others) in the calculations of the numbers x and y. For example, if I knew that x and y were the result of about 1000 summations and did not know any bounds on the numbers being summed, I would set error_factor to about 100.
Tried to add these as links but couldn't since this is my first post:
This "problem" can be "fixed" by using these options:
-msse2 -mfpmath=sse
as explained on this page:
http://www.network-theory.co.uk/docs/gccintro/gccintro%5F70.html
Once I used these options, both asserts passed.