ansaurus

Question

Answer 1

+6 A:

There are basically two major pitfalls people stumble in with floating-point numbers.

The problem of scale. Each FP number has an exponent which determines the overall “scale” of the number so you can represent either really small values or really larges ones, though the number of digits you can devote for that is limited. Adding two numbers of different scale will sometimes result in the smaller one being “eaten” since there is no way to fit it into the larger scale.
```
PS> $a = 1; $b = 0.0000000000000000000000001
PS> Write-Host a=$a b=$b
a=1 b=1E-25
PS> $a + $b
1
```
As an analogy for this case you could picture a large swimming pool and a teaspoon of water. Both are of very different sizes, but individually you can easily grasp how much they roughly are. Pouring the teaspoon into the swimming pool, however, will leave you still with roughly a swimming pool full of water.

(If the people learning this have trouble with exponential notation, one can also use the values 1 and 100000000000000000000 or so.)
Then there is the problem of binary vs. decimal representation. A number like 0.1 can't be represented exactly with a limited amount of binary digits. Some languages mask this, though:
```
PS> "{0:N50}" -f 0.1
0.10000000000000000000000000000000000000000000000000
```
But you can “amplify” the representation error by repeatedly adding the numbers together:
```
PS> $sum = 0; for ($i = 0; $i -lt 100; $i++) { $sum += 0.1 }; $sum
9,99999999999998
```
I can't think of a nice analogy to properly explain this, though. It's basically the same problem why you can represent ¹/₃ only approximately in decimal because to get the exact value you need to repeat the 3 indefinitely at the end of the decimal fraction.

Similarly, binary fractions are good for representing halves, quarters, eighths, etc. but things like a tenth will yield an infinitely repeating stream of binary digits.
Then there is another problem, though most people don't stumble into that, unless they're doing huge amounts of numerical stuff. But then, those already know about the problem. Since many floating-point numbers are merely approximations of the exact value this means that for a given approximation f of a real number r there can be infinitely many more real numbers r₁, r₂, ... which map to exactly the same approximation. Those numbers lie in a certain interval. Let's say that r_min is the minimum possible value of r that results in f and r_max the maximum possible value of r for which this holds, then you got an interval [r_min, r_max] where any number in that interval can be your actual number r.

Now, if you perform calculations on that number—adding, subtracting, multiplying, etc.—you lose precision. Every number is just an approximation, therefore you're actually performing calculations with intervals. The result is an interval too and the approximation error only ever gets larger, thereby widening the interval. You may get back a single number from that calculation. But that's merely one number from the interval of possible results, taking into account precision of your original operands and the precision loss due to the calculation.

That sort of thing is called Interval arithmetic and at least for me it was part of our math course at the university.

Joey 2010-01-20 10:12:17

Hi Johannes, that is definitely a good example, but it doesn't really tell people *why* it doesn't work. I'm looking to make someone understand the reason for the failing, not just the fact that it fails every now and again.

David Rutten 2010-01-20 10:26:07

Hm, other than explaining the problem of scale and the problem of binary vs. decimal representation I think I haven't found a better way to tell this to people :/. One might use similar anecdotes, such as adding a teaspoon of water to a swimming pool doesn't change our perception of how much is in it.

Joey 2010-01-20 10:30:01

To elaborate, many of the people I get in workshops aren't even very comfortable with scientific notation, so they already require a fair amount of mental effort to wrap their heads around the difference between -4e200, -4e-200, 4e-200 and 4e200.

David Rutten 2010-01-20 10:31:09

See, that swimming pool analogy is exactly the kind of thing I'm looking for!

David Rutten 2010-01-20 10:31:56

@David: Ok, incorporated that into the answer and elaborated a bit as well. Still, finding suitable analogies and easily-understood explanations isn't easy.

Joey 2010-01-20 12:30:02

Thanks Johannes, great answer.

David Rutten 2010-01-20 14:52:45

Answer 2

+2 A:

In python:

>>> 1.0 / 10
0.10000000000000001

Explain how some fractions cannot be represented precisely in binary. Just like some fractions (like 1/3) cannot be represented precisely in base 10.

codeape 2010-01-20 10:14:48

codeape, I'm looking for something a bit deeper than just parading examples of rounding errors. I'd like to be able to tell people why these errors creep up, and have them understand the reason behind it, without needing to understand the IEEE 754 specification.

David Rutten 2010-01-20 10:28:21

@David: give them an example where floating point numbers are exact, such as adding 0.25 multiple times. The result will be exact until you overflow the mantissa, because 0.25 is `1/(2^2)`. Then try the same thing with 0.2 and you will get the problems, because 0.2 isn't representable in a finite base-2 number.

Joachim Sauer 2010-01-20 12:16:13

Answer 3

+5 A:

Take a look into this article: What Every Computer Scientist Should Know About Floating-Point Arithmetic

Rubens Farias 2010-01-20 10:17:16

Thanks Rubens, that seems to be an excellent article. I'm sure I can get some good ideas from it.

David Rutten 2010-01-20 10:29:07

Answer 4

+4 A:

How's this for an explantation to the layman. One way computers represent numbers is by counting discrete units. These are digital computers. For whole numbers, those without a fractional part, modern digital computers count powers of two: 1, 2, 4, 8. ,,, Place value, binary digits, blah , blah, blah. For fractions, digital computers count inverse powers of two: 1/2, 1/4, 1/8, ... The problem is that many numbers can't be represented by a sum of a finite number of those inverse powers. Using more place values (more bits) will increase the precision of the representation of those 'problem' numbers, but never get it exactly because it only has a limited number of bits. Some numbers can't be represented with an infinite number of bits.

Snooze...

OK, you want to measure the volume of water in a container, and you only have 3 measuring cups: full cup, half cup, and quarter cup. After counting the last full cup, let's say there is one third of a cup remaining. Yet you can't measure that because it doesn't exactly fill any combination of available cups. It doesn't fill the half cup, and the overflow from the quarter cup is too small to fill anything. So you have an error - the difference between 1/3 and 1/4. This error is compounded when you combine it with errors from other measurements.

gary 2010-01-20 12:13:10

Answer 5

+3 A:

Show them that the base-10 system suffers from exactly the same problem.

Try to represent 1/3 as a decimal representation in base 10. You won't be able to do it exactly.

So if you write "0.3333", you will have a reasonably exact representation for many use cases.

But if you move that back to a fraction, you will get "3333/10000", which is not the same as "1/3".

Other fractions, such as 1/2 can easily be represented by a finite decimal representation in base-10: "0.5"

Now base-2 and base-10 suffer from essentially the same problem: both have some numbers that they can't represent exactly.

While base-10 has no problem representing 1/10 as "0.1" in base-2 you'd need an infinite representation starting with "0.000110011..".

Joachim Sauer 2010-01-20 12:23:51

ansaurus

tags:

views:

answers:

Floating point inaccuracy examples

related questions