tags:

views:

176

answers:

8

I have gone through earlier discussions on floating point numbers in SO but that didn't clarified my problem,I knew this floating point issues may be common in every forum but my question in not concerned about Floating point arithmetic or Comparison.I am rather inquisitive about its representation and output with %f.

The question is straight forward :"How to determine the exact output of :

float = <Some_Value>f;     
printf("%f \n",<Float_Variable>);

Lets us consider this code snippet:

float f = 43.2f,
f1 = 23.7f,
f2 = 58.89f,
f3 = 0.7f;

printf("f1 = %f\n",f);
printf("f2 = %f\n",f1);
printf("f3 = %f\n",f2);
printf("f4 = %f\n",f3);

Output:

f1 = 43.200001
f2 = 23.700001
f3 = 58.889999
f4 = 0.700000

I am aware that %f (is meant to be for double) has a default precision of 6, also I am aware that the problem (in this case) can be fixed by using double but I am inquisitive about the output f2 = 23.700001 and f3 = 58.889999 in float.

EDIT: I am aware that floating point number cannot be represented precisely, but what is the rule of for obtaining the closest representable value ?

Thanks,

+4  A: 

What Every Computer Scientist Should Know About Floating-Point Arithmetic

You may interest to see other people question regarding that on SO too.

Please take a look too.

http://stackoverflow.com/search?q=floating+point

S.Mark
I have that pdf document with me,though I haven't gone through the entire piece but I don't think the document have what I am looking for, If I am missing something, you can suggest me where to look in that pdf ?
nthrgeek
@Debanjan The point is that the values 23.7 and 58.89 are not exactly representable as floats. What you are seeing printed out is the 'closest' representable value (with the default precision used by printf also of course). It is NOT the case that the "problem can be fixed by using double" as there will be values that aren't exactly representable using a double precision floating point as well.
imaginaryboy
@imaginaryboy: Yes I am aware of it. I just want to know how to obtain the closest representable value ?
nthrgeek
+2  A: 

You can control the number of decimal points that are outputted by including this in the format specifier.

So instead of having

float f = 43.2f,
printf("f1 = %f\n",f);

Have this

float f = 43.2f,
printf("f1 = %.2f\n",f);

to print two numbers after the decimal point.

Do note that floating point numbers are not precisely represented in memory.

Shoko
No this is not what I am looking for I don't want to fix it,I just want to know how that conversion takes place.precisely some rules ?
nthrgeek
A: 

The compiler and CPU use IEEE 754 to represent floating point values in memory. Most rational numbers cannot be expressed exactly in this format, so the compiler chooses the closest approximate representation.

To avoid unpredictable output, you should round to the appropriate precision.

// outputs "0.70"
printf("%.2f\n", 0.7f);
Matthew
A: 

A floating point number or a double precision floating point number is stored as an integer numerator, and a power of 2 as denominator. The math behind it is pretty simple. It involves shifting and bit testing.

So when you declare a constant in base 10, the compiler converts it to a binary integer in 23 bits and an exponent in 8 (or 52 bit integer and 11 bit exponent).

To print it back out, it converts this fraction back into base 10.

UncleO
+3  A: 

A 32-bit float (as in this case) is represented as 1 bit of sign, 8 bits of exponent and 23 bits of the fractional part of the mantissa.

First, forget the sign of what you put in. Then the rest of what you put in will be stored as a fraction of the format

(1 + x/8,388,608) * 2^(y-127) (note that the 8.388,608 is 2^23). Where x is the fractional mantissa and y is the exponent. Believe it or not, there is only one representation in this form for every value you put in. The value stored will be the closest value to the number you want, if your value cannot be represented exactly, it means you'll pick up an extra .0001 or whatever.

So, if you want to figure out the value that will actually be stored, just figure out what it will turn into.

So second thing to do (after throwing out the sign) is to find the largest power of 2 that is smaller in magnitude than the number you are representing. So let's take 43.2.

The largest power of two smaller than that is 32. So that's the "1" on the left, since it's a 32, not a 1, that means the 2^ value on the right must be 2^5 (32), which means y is 132. So now subtract off the 32, it's done for. What's left is 11.2. Now we need to represent 11.2 as a fraction over 8,338,608 times 2^5.

So

11.2 approx equals x*32/8,336,608 or x/262,144. The value you get for x is 2,938,013/262,144. The real numerator was 0.2 lower (2,938,012.8), so there will be an error of 0.2 in 262,144 or 2 in 131,072. In decmial, this value is 0.000015258789063. So if you print enough digits, you'll see this error value show up in your output.

When you see the output be too low, it's because the rounding went the other way, the value produced was nearer to the wanted value by being too low, and so you get an output that is too low. When the value can be represented exactly (like for example any power of 2), you never get an error.

It's not simple, but there you go. I'm sure you can code this up.

*note: for very small in magnitude values (roughly less than 2^-127) you get into weirdness called denormals. I'm not going to explain them, but they won't fit the pattern. Luckily they don't show up much. And once you get into that range, your accuracy goes to pot anyway.

Southern Hospitality
+6  A: 

Assuming that you're talking about IEEE 754 float, which has a precision of 24 binary digits: represent the number in binary (exactly) and round the number to the 24th most significant digit. The result will be the closest floating point.

For example, 23.7 represented in binary is

10111.1011001100110011001100110011...

After rounding you'll get

10111.1011001100110011010

Which in decimal is

23.700000762939453125

After rounding to the sixth decimal place, you'll have

23.700001

which is exactly the output of your printf.

avakar
+1 I worked the same example for the questioner before I noticed that you had already done it. Doh!
Stephen Canon
This is correct and it's how I personally do it. But I don't know it's all that useful because if this person understood binary decimals (ouch, I hate that description!) he probably wouldn't have asked the question in the first place.
Southern Hospitality
A: 

Gross simplification: the rule is that "floats are good for 2 or 3 decimal places, doubles for 4 or 5". That is to say, the first 2 or 3 decimal places printed will be exactly what you put in. After that, you have to work out the encoding to see what you're going to get.

This is only a rule of thumb, and as it happens your test case shows one instance where the float representation is only good to 1 d.p.

Matt Gordon
A: 

The way to figure out what will be printed is to simulate exactly what the compiler / libraries / hardware will do:

  1. Convert the number to binary, and round to 24 significant (binary) digits.
  2. Convert that number to decimal, and round to 6 (decimal) digits after the decimal point.

Of course, this is exactly what your program does already, so what are you asking for?

Edit to illustrate, I'll work through one of your examples:

Begin by converting 23.7 to binary:

10111.1011001100110011001100110011001100110011001100110011...

Round that number to 24 significant binary digits:

10111.1011001100110011010

Note that it rounded up. Converting back to decimal gives:

23.700000762939453125

Now, round this value to 6 digits after the decimal point:

23.700001

Which is exactly what you observed.

Stephen Canon
Hah, didn't notice until after I edited that **avakar** already worked this same example. I didn't copy his, really! =)
Stephen Canon