views:

167

answers:

5

I'm having a problem with the specific set of values in this code snippet.

double inputs[] = {0, -546543, 99015, 6750, 825, 2725, 70475, 
    50950, 42200, 6750, 26925, 16125, 134350, 10075, 79378};
double result = 0;
for (int i = 0; i < 15; i++) {
    result += inputs[i]/100;
}

I expected the final value of result to be 0. And if I take out the division by 100 it is. But when I divide each value by 100 before adding it to result, I end up with -6.8212102632969618e-013 instead.

There's a lot I don't understand about floating point arithmetic. I know it's not guaranteed to be fully precise. But there doesn't seem to be anything unusual about this dataset—no very large or very small values—so I'm suprised that the calculation is coming out wrong.

Can anybody explain this to me, and offer any suggestions as to how to avoid this problem? The code I presented is simplified; in the actual code I can't just not divide by 100, and I can't very easily add the numbers as integers and divide them later.

Any suggestions will be appreciated.

+4  A: 

I can't very easily add the numbers as integers and divide them later.

Why not? That sounds like exactly the solution to your problem. Adding integers and dividing once is likely to much faster than adding floating-point numbers and dividing so much, too.

You are just accumulating error every time you divide by 100 (since 100 is not power of 2). All of your numbers are exactly representable in a double, but when you divide them, they aren't - hence your error. There's not really anything you can do about it except modify your algorithm.

In your case, since you're dividing by 100, you could round off the final sum to the nearest 100th, and get the right result.

Carl Norum
You're doing what I would do, but he does address it in the first part of his sentence: `The code I presented is simplified; in the actual code [...] I can't very easily add the numbers as integers and divide them later.`
egrunin
@egrunin, but he doesn't explain why he can't.
Carl Norum
@Carl Norum, true, I didn't explain why because I didn't want to clutter up the post. I've got one method that reads data from a text file with no decimal points (it's generated by a COBOL program!) and assigns those values as doubles to fields on an object, and then a much later method that adds those values, but only under certain circumstances. So if I don't divide it immediately after reading it, I'll spend the whole program trying to remember whether I've divided it or not. It's simpler just to do it at the point where I read in the data.
John M Gant
@John, then it sounds like only the solution in my 3rd paragraph will solve your problem.
Carl Norum
A: 

If you have to use floating point numbers, you might want to:

  • first add the positive and the negative ones separately and then subtract the sums

  • divide once by 100 after calculating the sums

As far as I recall is it not guaranteed that the width of the floating point registers in memory and in the CPU are the same size (i.e. some rounding probably happens when you transfer into locations with less width) and where your values are stored depends on what machine code the compiler produces.

Andre Holzner
The width of the registers isn't the issue here, it's that random integers divided by 100 can't be exactly expressed in binary floating point. Since they're inexact, doing operations on them will likely give you inexact answers.
David Thornley
+1  A: 

What you probably don't consider is, that double uses base-2 for the internal representation. So although the values don't look very precise or small at a first glance, they may be surprisingly hard to represent using base-2.

Take 1/3 = 0.333... for example. You can't precisely describe the value with a finite number of digits in base-10 (except when storing it as a fraction, but let's leave that aside). The same is true for some values in base-2, that look just fine as base-10 numbers.

0.01 is such an example. To store it in base-2, you would need an infinite number of digits and thus 1.0/100.0 isn't precise (as float or double). You divide base-10 integers by 100, so you probably see where this leads to.

Some programming languages offer a base-10 floating-point type (decimal in C# for example), for financial calculations. It's better suited for the base-10 stuff we are used to, but more expensive for the computer. Still there are of course numbers that can't be represented (1/3, ...).

Gnafoo
A: 

Read The Floating-Point Guide, then you will understand:

internally, computers use a format (binary floating-point) that cannot accurately represent a number like 0.1, 0.2 or 0.3 at all.

Basically, every single of your intermediate values (after dividing by 100) has a rounding error. It's not unusual at all that the final result retains an error as well.

Decimal numbers cannot accurately represent a number like 1/3, so you have to round to something like 0.33 - and you don’t expect 0.33 + 0.33 + 0.33 to add up to 1, either - do you?

What can I do to avoid this problem?

That depends on what kind of calculations you’re doing.

  • If you really need your results to add up exactly, especially when you work with money: use a special decimal datatype.
  • If you just don’t want to see all those extra decimal places: simply format your result rounded to a fixed number of decimal places when displaying it.
  • If you have no decimal datatype available, an alternative is to work with integers, e.g. do money calculations entirely in cents. But this is more work and has some drawbacks.

Note that the first case (needing specific decimal results) does not apply in most cases.

Michael Borgwardt
+1  A: 

Normally, when comparing floating point numbers from different implementations, you can establish some sort of pass/fail criterion for determining equality (or zero). Something like 10^-10 for double precision and maybe 10^-6 for single. Some such benchmark is required if, for example, you're comparing Matlab results to those generated by application code, and its an acknowledgment of unavoidable precision errors.

rlduffy
+1, good idea. I think that will work for me.
John M Gant
@John: note that comparing flaoting point values using such an epsilon values can be far more complex than it seems. See http://floating-point-gui.de/errors/comparison/
Michael Borgwardt
@Michael: +1 for the good link. What I neglected to mention is that the thresholds should probably only serve to identify areas that warrant further analysis.
rlduffy