tags:

views:

839

answers:

7

I know, i've read about the difference between double precision and single precision etc. But they should give the same results on most cases right ?

I was solving a problem on a programming contest and there were calculations with floating point numbers that were not really big so i decided to use float instead of double, and i checked it - i was getting the correct results. But when i send the solution, it said only 1 of 10 tests was correct. I checked again and again, until i found that using float is not the same using double. I put double for the calculations and double for the output, and the program gave the SAME results, but this time it passed all the 10 tests correctly.

I repeat, the output was the SAME, the results were the SAME, but putting float didn't work - only double. The values were not so big too, and the program gave the same results on the same tests both with float and double, but the online judge accepted only the double-provided solution.

Why ? What is the difference ?

+4  A: 
  • A double is 64 and single precision (float) is 32 bits.
  • The double has a bigger mantissa (the integer bits of the real number).
  • Any inaccuracies will be smaller in the double.
graham.reeds
+4  A: 

Floats have less precision than doubles. Although you already know, read What WE Should Know About Floating-Point Arithmetic for better understanding.

N 1.1
+11  A: 

Here is what the standard C99 (ISO-IEC 9899 6.2.5 §10) or C++2003 (ISO-IEC 14882-2003 3.1.9 §8) standards say:

There are three floating point types: float, double, and long double. The type double provides at least as much precision as float, and the type long double provides at least as much precision as double. The set of values of the type float is a subset of the set of values of the type double; the set of values of the type double is a subset of the set of values of the type long double.

The C++ standard adds:

The value representation of floating-point types is implementation-defined.

I would suggest having a look at the excellent What Every Computer Scientist Should Know About Floating-Point Arithmetic that covers the IEEE floating-point standard in depth. You'll learn about the representation details and you'll realize there is a tradeoff between magnitude and precision. The precision of the floating point representation increases as the magnitude decreases, hence floating point numbers between -1 and 1 are those with the most precision.

Gregory Pakosz
we pasted the same links?! :D
N 1.1
indeed, it's the obvious one that comes to mind :)
Gregory Pakosz
I think nvl should get the points as Gregory has more points and nvl is just starting out.
graham.reeds
@graham: thanks man! :) but he deserves as much as i do. :P
N 1.1
@graham.reeds fair enough. Gregory may get more votes, though, because he put the title in his link, so his answer is more easily seen to be a good one.
David Gelhar
@Gelhar: ya right! I changed my link too.
N 1.1
@nvl, you got my vote already -- @grahams points don't really matter anyway
Gregory Pakosz
@gregory: amen @ points dont really ..
N 1.1
A: 

When using floating point numbers you cannot trust that your local tests will be exactly the same as the tests that are done on the server side. The environment and the compiler are probably different on you local system and where the final tests are run. I have seen this problem many times before in some TopCoder competitions especially if you try to compare two floating point numbers.

Tuomas Pelkonen
+1  A: 

"there were calculations with floating point numbers that were not really big"

The size of the numbers is irrelevant it's the calculation that is being performed that is relevant.

In essence if you're performing a calculation and the result is an irrational number or recurring decimal then there will be rounding errors when that number is squashed into the finite size datastructure you're using. Since double is twice the size of float then the rounding error will be a lot smaller.

The online test probably specifically used numbers which would cause this kind of error and therefore tested that you'd used the appropriate type in your code.

Dolbz
+7  A: 

Huge difference.

As the name implies, a double has 2x the precision of float[1]. In general a double has 15 to 16 decimal digits of precision, while float only has 7.

This precision loss could lead to truncation errors much easier to float up, e.g.

    float a = 1.f / 81;
    float b = 0;
    for (int i = 0; i < 729; ++ i)
            b += a;
    printf("%.7g\n", b);   // prints 9.000023

while

    double a = 1.0 / 81;
    double b = 0;
    for (int i = 0; i < 729; ++ i)
            b += a;
    printf("%.15g\n", b);   // prints 8.99999999999996

Also, the maximum value of float is only about 3e38, but double is about 1.7e308, so using float can hit Infinity much easier than double for something simple e.g. computing 60!.

Maybe the their test case contains these huge numbers which causes your program to fail.


Of course sometimes even double isn't accurate enough, hence we have long double[1] (the above example gives 9.000000000000000066 on Mac), but all these floating point type suffer from round-off errors, so if precision is very important (e.g. money processing) you should use int or a fraction class.


BTW, don't use += to sum lots of floating point numbers as the errors accumulate quickly. If you're using Python, use fsum. Otherwise, try to implement the Kahan summation algorithm.


[1]: The C and C++ standards do not specify the representation of float, double and long double. It is possible that all three implemented as IEEE double-precision. Nevertheless, for most architectures (gcc, MSVC; x86, x64, ARM) float is indeed a IEEE single-precision floating point number (binary32), and double is a IEEE double-precision floating point number (binary64).

KennyTM
".don't use += to sum lots of floating point numbers as the errors accumulate quickly" - maybe that's my problem. I want to know why this happens ?
VaioIsBorn
@Vaio: Imagine 1000 + 1.6 + 1.6 + 1.6 + 1.6 and you're limited to 4 significant figures. Using += will give the wrong value of 1008 because the 1.6's are forced to round up to 2.
KennyTM
Actually, double has more than 2.2x the precision: 53 bits vs. 24 bits.
bk1e
-1 Don't mistake the properties of your implementation for guarantees that the standard makes. Double is *not* guaranteed to have a higher precision than float, although, it *usually* does. The standard only requires double to be at least as good as float in terms of precision.
sellibitze
The usual advice for summation is to sort your floating point numbers by magnitude (smallest first) before summing.
R..
+1  A: 
Alok