views:

574

answers:

17

In mathematics the identity (1 + sqrt(2))^2 = 3 + 2*sqrt(2) holds true. But in floating point (IEEE 754, using single precision i.e. 32 bits) calculations it's not the case, as sqrt(2) doesn't have an exact representation in binary.

So does using a approximated value of sqrt(2) provide different results for left and right hand sides? If so why? Does squaring the approximated value reduce accuracy significantly?

Which of the equivalent expressions then gives the most accurate result?

+12  A: 

Since even 0.1 + 0.2 != 0.3 you shouldn't count on such complex equalities to hold for limited precision floating point numbers.

Since the numbers are stored rounded to a certain number of binary decimals they aren't exact if the number (like 0.1) would have infinitely many binary digits. Therefore also the results of computations with those numbers will not be exact, and small differences to the exact result of a computation are expected.

sth
You shouldn't count on it, but it actually turns out that both sides of this particular (not-so-complex) equation are exactly equal in single-precision floating point.
tgamblin
And in double, too.
Stephen Canon
+4  A: 

So does using a approximated value of sqrt(2) provide different results for left and right hand sides? If so why?

Mathematically, this equality only works because of an exact relationship between these numbers (it has to do with lengths of the sides of a triangle). If you add fuzziness in the form of inexact representation, the equality is no longer true. Equality is a binary proposition, so the question is no longer "which side is right", but rather "is this relationship true at all?". And the answer is, "no, it isn't true anymore".

Does squaring the approximated value reduce accuracy significantly?

Every operation on two floating-point values is likely to reduce their accuracy. A very small subset of operations for certain numbers -- those with exact bit representations -- can be guaranteed not to worsen accuracy.

John Feminella
"(it has to do with lengths of the sides of a triangle)" - not really, just multiply out the brackets on the lhs: `(1 + sqrt(2))*(1 + sqrt(2)) == 1*1 + sqrt(2)*sqrt(2) + 2*1*sqrt(2)`. I'm sure it's possible to come up with a geometrical proof too.
Steve Jessop
@Steve Jessop: You can get it by noting that the equation for the lengths of the sides of a right triangle, `a^2 + b^2 = c^2`, fits this perfectly, with `a = sqrt(3)`, `b = sqrt(2 * sqrt(2))`, and `c = 1 + sqrt(2)`. Pythagoras showed it originally, and this particular variation was in Euclid's _Elements_.
John Feminella
Nice, but I bet my algebraic proof is shorter than the geometric proof that the triangle you specify is indeed right-angled. I confess though that I never did much Euclidean geometry, so to me it always seems easier just to do algebra on co-ordinates than to draw pretty pictures. Just constructing `sqrt(2*sqrt(2))` looks like a lot of trouble to me, never mind constructing a right-angled triangle on top of that ;-)
Steve Jessop
And yes, I do find it amusing that I have a masters degree in mathematics, but if I was sent back in time more than about 50 years, I'd (initially) struggle with schoolboy maths. Anywhere in the range 100-400 years, I would find the cutting edge of mathematics research considerably easier than the classroom stuff...
Steve Jessop
I guess my point is that this doesn't require "constructing" anything, just noting that it fits a pattern. For example, if `a` were 300 and `b` were 400, I'm already familiar with the (3,4,5) right triangle, so I wouldn't have to actually multiply anything to know that `c` is definitely 500.
John Feminella
+1  A: 

Look at the bright side: If you re-work that equation to remove the sqrts, then since you'll be dealing with reasonably sized whole numbers, the equation will be exact in floating point ;)

Inaccuracies are usually associated with numbers that require decimal fractions (other than powers of .5 and .2) to represent.


To answer another part of your question: No, the represenatation of sqrt(2) is indeed the same on both sides. The errors (and differences) are not introduced until you start applying (different) operations to that same number on both sides: Adding 1 vs. multiplying by 2, etc.

Carl Smotricz
A: 

When comparing floating point values I find it's best to compare the absolute value of the difference to a given tolerance. You can always count on that.

duffymo
Better test the relative error, not the absolute error. Floating-point numbers are not uniformly distributed and their inter-distance grows as they grow.
lhf
Agreed, but in this case, we're talking about a residual that would be much less than one.
duffymo
A: 

in general two sides are going to give you different results. floating-point mathematics does not satisfy commutative and associated property. there is a number of factors involved, including compiler options and hardware.

for your equation you can probably find out which side is more accurate (my guess multiplied side), but it is not going to hold in general if you decide to use different values, i.e. one side may be more accurate for certain values, whereas the other side is more accurate for other values.

Squaring should not affect the results significantly in your case.

aaa
+1  A: 

The person who has defined the equality comparator for floats in C++ should be shot :>. Many reasonable languages (like SML) don't have a comparison operator for floats. I use the following code usually:

template < typename T >
inline bool equals( T x, T y, T precision = std::numeric_limits<T>::epsilon() ) 
{
    return abs( x - y ) <= precision;
}

Note: abs is also a templated function here, epsilon default is stored outside. The equals in the comparison is intended for my purposes.

Kornel Kisielewicz
Is `abs` overloaded in the Standard? I thought it wasn't, which would make this incorrect unless you use the appropriate member of the `fabs` family.
ephemient
Yes, it is, in `<cstdlib>`.
greyfade
I use an own, templated implementation of abs, that's what I said in the note.
Kornel Kisielewicz
pls note (1 + sqrt(2))^2 = 3 + 2*sqrt(2) is not a comparison in c++! (i didn't use '==' ) i meant that the identity holds in mathematics.
Naximus
using 'epsilon' here means that your 'equals' gives the same result as '==' if 'x' or 'y' is greater than 2.0 (as it is in the example up top).
Chris Dodd
So `equals(a,b)` can be true, and `equals(a,c)` too, even when `equals(a,c)` is false? Say what you will about `==`, but at least it's transitive.
MSalters
The equality comparator for floats is required for IEEE754 compliance. And no, those of us responsible for it should *not* be shot. Exact equality tests are often necessary in floating point code, even if you personally don't use them.
Stephen Canon
+4  A: 

Generally I use [(1 + sqrt(2))^2] - [3 + 2*sqrt(2)] < 0.00001 to test equality in such conditions (of course for some cases I ignore this usage)

Is there a better way?

comments are appreciated :)

rahmivolkan
For rounding error you can probably get away with a smaller tolerance. Single-precision FP is about 7 decimal digits of precision, and double-precision is about 16 decimal digits, so 1e-6 and 1e-15 work well for normalized tolerances.
tgamblin
Actually, the tolerance should scale with the values. For very large values, the tolerances will be higher than 0.00001 and for small values, the tolerances could be much lower.
Adisak
That's what "normalized" means.
tgamblin
thank you for your comments. It means that I am doing fine but not the best about the issue!
rahmivolkan
I think it should be fabs([(1 + sqrt(2))^2] - [3 + 2*sqrt(2)]) < 0.00001
nikie
You should use a combination of absolute and relative tolerance: Knuth discusses it, and look at fcmp on sourceforge/google for an implementation.
Alok
A: 

So does using a approximated value of sqrt(2) provide different results for left and right hand sides? If so why? Does squaring the approximated value reduce accuracy significantly?

the addition and the multiplication have both error approximation. the multiplication is empiric, especially when it is nested.

The following is not a accurate representation but it help to understand my point:

example of addition:
(float1 * float2 + float3)
float1 * float2 + float3 + mult_approximation + add_approximation

example multiplication
(float1 * (float2 + float3))
(float1 * (float2 + float3 + add_apporiximation)
float1 * (float2 + float3) + add_approximation * float1 + mult_approximation
Phong
A: 

I use the following code usually:

Equality is supposed to be transitive, that is: equals(a, b) && equals(b, c) => equals(a, c). Your equals does not guarantee that and is thus broken.

Fred
A: 

It's because representing continuous (infinite) functions like sqrt(x) can't be done on a discrete (finite) state machine exactly. Instead continuous functions are translated to discrete functions via Taylor Series expansion from 0 to n, where n is the highest number you can represent (in this case 2^32). Because you can't take the sum from 0 to infinity on a computer, you're left with some remainder of error. This error can be calculated so you can determine how close your discrete function is to the continuous function.

For more information and pretty TeX representations of the equations involved: http://en.wikipedia.org/wiki/Taylor_series

Jim Wallace
+4  A: 

Beware guys, relying only on the absolute difference can cause problems. It works for small numbers around 1, that has enough decimal points to be able to differ by 1e-5 or what you use. But think about larger numbers. Their digits have to be stored in a limited space (mantissa). And only the most significant digits are stored. What does that mean? That there is no space left to store digits that would allow to measure differences like 1e-5!

Wrapping up, it is better to use absolute AND relative comparison at the same time.

bool equal(float a, float b)
{
   if (abs(a-b)<eps) return true;
   if (abs(a-b)/max(abs(a),abs(b))<eps) return true;
   return false;
} 

Ahh, how am I going to score at the TopCoder tomorrow, when I can not write 3 lines of code:-D

supo
+1 for absolute and relative tolerances, but I don't think this code is correct even now. The expression for the second `if` condition should be `return true;` and `return false;` should be the fallback.
Alok
Preserving the absolute error case breaks it for small numbers, and the relative error case is too complicated and uses (slow) division. Try `if ( abs(a-b) < ( std::numeric_limits<float>::epsilon() * (1<<4) ) * abs(a) )`, replacing 4 with the tolerance in error-bits.
Potatoswatter
+1  A: 

In double precision, (1 + sqrt(2))^2 = 3 + 2*sqrt(2) seems to hold. See C code.

lhf
You can't rely on that fact though. In general, you don't want to compare floating-point numbers for equality.
Alok
Also, my guess is that your compiler simplified the code a lot and it doesn't actually compute and compare what you expect.
mxp
I don't think it's due to compiler optimizations. I've also tried it in Python and Lua, with the same results.
lhf
See the detailed answer by Stephen Canon.
lhf
A: 

Surprisingly, if for some reason you need accurate representation of non-rational numbers (hint: you probably don't), there is something you can do: continued fractions arithmetic. The idea goes to 1972, and is due to the super-hacker Bill Gosper - google it up. By the way, more advanced aspects of this idea are a matter of current research in mathematics; see e.g. this paper.

David Lehavi
Unfortunately numbers such as sqrt(sqrt(2)) can't be represented this way. However, you can represent all constructible numbers as mathematical statements. Arithmetic is quick, and comparisons (like less-than and equals) are guaranteed to finish in a finite amount of time! (If that sounds like less than a ringing endorsement of the sanity of doing computation this way, you're right. But it is surprising that it's possible.) http://stackoverflow.com/questions/1784901/can-coordinates-of-constructable-points-be-represented-exactly
Jason Orendorff
Finish in a finite amount of time: sometimes. Efficient: now, where as Gosper's algorithm is (when you have a nice presentation as a generalized continues fractions)
David Lehavi
+9  A: 

This identity happens to hold when computed as written in IEEE-754 double precision. Here's why:

The square root of two correctly rounded to double precision is:

sqrt(2) = 0x1.6a09e667f3bcd * 2^0

(I'm using hexadecimal here because the representations are tidier, and the translation into the IEEE754 format is much easier). Multiplication by two is exact in binary floating-point if no overflow occurs, as in this case here, so:

2*sqrt(2) = 0x1.6a09e667f3bcd * 2^1

When we add three, we get:

3 + 2*sqrt(2) = 0x1.7504f333f9de68 * 2^2

This, however, is not a representable double-precision number (it is one bit too wide), so the result is rounded to the nearest representable number. It happens that this value is exactly halfway between two representable numbers, so we pick the one with a trailing zero bit:

3 + 2*sqrt(2) = 0x1.7504f333f9de6 * 2^2

Now the other side of the computation. When we add one to the double-precision square root of two, we get:

1 + sqrt(2) = 0x1.3504f333f9de68 * 2^1

This is also an exact halfway case between to representable double-precision numbers, and again it is rounded to the nearest "even" representable number:

1 + sqrt(2) = 0x1.3504f333f9de6 * 2^1

When this value is squared, the result is:

(1 + sqrt(2))*(1 + sqrt(2)) = 0x1.7504f333f9de599cacbc97eaa4 * 2^2

Which is not a representable double-precision number either. This one is not an exact halfway case, so it merely rounds to the nearest representable number, which is:

(1 + sqrt(2))*(1 + sqrt(2)) = 0x1.7504f333f9de6 * 2^2

Summary: Computing this value in two different ways incurs two different sequences of roundings, but the final result is the same. We only looked at the computation in double precision, however; this may not be the case when the computation is carried out using different arithmetic types.

In general, however, the expression 3 + 2*sqrt(2) should be expected to be the more accurate (in cases where they differ), because it incurs only two roundings (the square root and the add) for any binary IEEE-754 type, whereas (1 + sqrt(2))*(1 + sqrt(2)) incurs three roundings (square root, add, and multiply). It should also be noted that the difference between the two will be at most one or two bits, and is probably negligable for your purposes.

Stephen Canon
Thanks for the detailed explanation.
lhf
A: 

I'm going to throw out one more idea -

Yes, it's true, that exact equality of real numbers is a meaningless concept in computer programming.

But it's also true that exact equality of real numbers is a meaningless concept in our physical reality.

Integers in our physical reality are the result of counting. real numbers in our physical reality are the result of measurement. And all measurements include errors. To say that two physical measurements have precisely the same value is nonsense. At best, two physical measurements, rounded to some level of precision appropriate to the accuracy that the measurement is capable of, are equal.

When you measure the length of a pencil with a ruler, you get a length to the nearest 16th of an inch. When you measure it with a pair of calipers, you get a length to the nearest 1000th of an inch. Real world measurements always include this sort of rounding. When you simulate real world measurements in a computer program, you need to do the same.

Equality of real numbers is a meaningful concept only for mathematicians. (And even there, it's a different, and more complicated concept, than equality of integers).

Jeff Dege
...but computers round floating point calculations to the nearest ulp (see http://en.wikipedia.org/wiki/Unit_in_the_last_place)
Jason S
But they don't round to the precision that is appropriate for the problem. You have to do that.
Jeff Dege
+1  A: 

sqrt(2) doesn't have an exact representation in binary.

sqrt(2) doesn't have an exact representation in decimal, hex, or any other base-n system either; it is an irrational number.

The only exact representation of sqrt(2) is sqrt(2). Or, as a solution to the equation x2 = 2.

Seth
Or as [1;2,2,2,...] =)
Stephen Canon
I prefer 1 + 1/(2 + 1/(2 + 1/(2 + 1/(2 + 1/...)))), which just looks like lisp when written here :P
Seth
A: 
Loadmaster