views:

1046

answers:

8

I was wondering if there is a way of overcoming an accuracy problem that seems to be the result of my machine's internal representation of floating-point numbers:

For the sake of clarity the problem is summarized as:

// str is "4.600";   atof( str ) is 4.5999999999999996  
double mw = atof( str )  

// The variables used in the columns calculation below are:   
//  
//                    mw = 4.5999999999999996  
//                    p = 0.2  
//                    g = 0.2  
//                    h = 1 (integer)  

int columns = (int) ( ( mw - ( h * 11 * p ) ) / ( ( h * 11 * p ) + g ) ) + 1;

Prior to casting to an integer type the result of the columns calculation is 1.9999999999999996; so near yet so far from the desired result of 2.0.

Any suggestions most welcome.

+1  A: 

Use decimals: decNumber++

Can Berk Güder
Does that solve the 3*(1/3) problem? Or only the 10*(1/10) problem?
MSalters
-1, for exactly the reason MSalters gave. Decimal numbers are useful for working with money not because they have superior precision but because your imprecise calculations will be identical to everyone elses'. In all other respects decimal numbers suffer from the exact same problems.
j_random_hacker
Although there are some libraries that store fractions. 4.6 would be 4 + 3/5 in one of those. They only fall apart when given an operation impossible to manage as a fraction, like multiplying by pi.
Zan Lynx
They solve the OP's problem, isn't that the point?
Can Berk Güder
@Can: They may solve this particular instance, but there definitely exist values of mw, p, g and h for which the exact same problem will recur. That's what makes this solution a hack -- it only works for a few cases, not for all cases.
j_random_hacker
@Zan: Yes, a rational number library would solve the problem, since it can *exactly* represent *any* value that that code snippet could produce. (As you said, if the code was changed to use irrational numbers (e.g. by calculating square roots or trig functions etc.) this would no longer be true.)
j_random_hacker
+2  A: 

You can read this paper to find what you are looking for.

You can get the absolute value of the result as seen here:

x = 0.2;  
y = 0.3;  
equal = (Math.abs(x - y) < 0.000001)
jmein
+11  A: 

If you haven't read it, the title of this paper is really correct. Please consider reading it, to learn more about the fundamentals of floating-point arithmetic on modern computers, some pitfalls, and explanations as to why they behave the way they do.

unwind
That's a great article.
Scottie T
+13  A: 

When you use floating point arithmetic strict equality is almost meaningless. You usually want to compare with a range of acceptable values.

Note that some values can not be represented exactly as floating point vlues.

See What Every Computer Scientist Should Know About Floating-Point Arithmetic and Comparing floating point numbers.

kmkaplan
+3  A: 

A very simple and effective way to round a floating point number to an integer:

int rounded = (int)(f + 0.5);

Note: this only works if f is always positive. (thanks j random hacker)

Kip
Assuming f is positive.
j_random_hacker
Yes "columns" is always positive in this application.
AndyUK
@j_random_hacker - you could use absolute value, in theory.
Moshe
@Moshe: Not sure abs() would buy you much, since presumably you want the final answer to have the original sign and that will mean you need to "invert" the abs() by multiplying by the original sign. Probably simpler just to replace the `0.5` by `(0.5 - (f < 0))`.
j_random_hacker
@jP_random_hacker - To be honest, I don't understand that last bit of code you've posted, but yes, that is a valid point.
Moshe
@Moshe: It's unnecessarily cryptic but I thought it was cute... :) If `f` is positive or 0, `(f < 0)` is `0` so the whole expression evaluates to `0.5` as before, so rounding of positive numbers is unaffected; but if `f` is negative, `(f < 0)` evaluates to `1`, which is then subtracted from `0.5` to give `-0.5`, which will cause negative numbers to be rounded-to-nearest as well.
j_random_hacker
+9  A: 

There's no accurracy problem.

The result you got (1.9999999999999996) differed from the mathematical result (2) by a margin of 1E-16. That's quite accurate, considering your input "4.600".

You do have a rounding problem, of course. The default rounding in C++ is truncation; you want something similar to Kip's solution. Details depend on your exact domain, do you expect round(-x)== - round(x) ?

MSalters
+1 for noticing the real problem.
David Thornley
+3  A: 

If accuracy is really important then you should consider using double precision floating point numbers rather than just floating point. Though from your question it does appear that you already are. However, you still have a problem with checking for specific values. You need code along the lines of (assuming you're checking your value against zero):

if (abs(value) < epsilon)
{
   // Do Stuff
}

where "epsilon" is some small, but non zero value.

ChrisF
I think you mean "abs(computed_value - expected_value) < epsilon". Otherwise you're just checking if the final value is really small; not whether the final value is really close to what it should be.
Max Lybbert
Indeed - but I did mention that the code was an example for checking against zero ;)
ChrisF
+2  A: 

On computers, floating point numbers are never exact. They are always just a close approximation. (1e-16 is close.)

Sometimes there are hidden bits you don't see. Sometimes basic rules of algebra no longer apply: a*b != b*a. Sometimes comparing a register to memory shows up these subtle differences. Or using a math coprocessor vs a runtime floating point library. (I've been doing this waayyy tooo long.)

C99 defines: (Look in math.h)

double round(double x);
float roundf(float x);
long double roundl(long double x);

.

Or you can roll your own:

template<class TYPE> inline int ROUND(const TYPE & x)
{ return int( (x > 0) ? (x + 0.5) : (x - 0.5) ); }

For floating point equivalence, try:

template<class TYPE> inline TYPE ABS(const TYPE & t)
{ return t>=0 ? t : - t; }

template<class TYPE> inline bool FLOAT_EQUIVALENT(
    const TYPE & x, const TYPE & y, const TYPE & epsilon )
{ return ABS(x-y) < epsilon; }
Mr.Ree