views:

388

answers:

8

After one hour of trying to find a bug in my code I've finally found the reason. I was trying to add a very small float to 1f, but nothing was happening. While trying to figure out why I found that adding that small float to 0f worked perfectly.

Why is this happening? Does this have to do with 'orders of magnitude'? Is there any workaround to this problem?

Thanks in advance.

Edit:

Changing to double precision or decimal is not an option at the moment.

+3  A: 

It looks like it has something to do with floating point precision. If I were you, I'd use a different type, like decimal. That should fix precision errors.

Alex Fort
This is almost always the answer to "why don't my floating point values do what I think they should". The deeper answer is that binary fractions don't map well to decimal ones.
Harper Shelby
In this case it's clearly a precision problem, though, not one of decimal vs. binary fractions.
Joey
+7  A: 

Floating-point arithmetic

Rex M
+4  A: 

This probably happens because the number of digits of precision in a float is constant, but the exponent can obviously vary.

This means that although you can add your small number to 0, you cannot expect to add it to a number that has an exponent different from 0, since there just won't be enough digits of precision left.

You should read What Every Computer Scientist Should Know About Floating-Point Arithmetic.

unwind
+19  A: 

Because precision for a single-precision (32 bit) floating-point value is around 7 digits after the decimal point. Which means the value you are adding is essentially zero, at least when added to 1. The value itself, however, can effortlessly stored in a float since the exponent is small in that case. But to successfully add it to 1 you have to use the exponent of the larger number ... and then the digits after the zeroes disappear in rounding.

You can use double if you need more precision. Performance-wise this shouldn't make a difference on today's hardware and memory is often also not as constrained that you have to think about every single variable.

EDIT: As you stated that using double is not an option you could try to sort the values you are adding beforehand (assuming you are trying to calculate a sum). Starting to sum up the small values ensures that you won't get that large rounding errors when you're done in the end. But still, it can happen.

Another option may be to perform intermediary calculations in double-precision and afterwards cast to float again. This will only help, however, when there are a few more operations than just adding a very small number to a larger one.

Joey
I once had an odd conversation with an Acturary on these lines. I eventually had to say we could implement our own floating point handling, it'll take 10 days to build, 10 to fit and it will bring our run time from 2-3 seconds to 15-20 minutes. Suddenly he could live with the rounding error :)
Binary Worrier
Instead of sorting you'd better use Kahan Summation.
Adrian
+3  A: 

With float, you only get an accuracy of about seven digits. So your number'll be rounded into 1f. If you want to store such number, use double instead

http://msdn.microsoft.com/en-us/library/ayazw934.aspx

Vimvq1987
A: 

You're using the f suffix on your literals, which will make these floats instead of doubles. So your very small float will vanish in the bigger float.

Dave Van den Eynde
I explicitly added the 'f' suffix to make it clear that I was using floats instead of doubles/decimals.
Trap
+2  A: 

If performance is an issue (because you can't use double), then binary scaling/fixed-point may be an option. floats are stored as integers, but scaled by a large number (say, 2^16). Intermediate arithmetic is done with (relatively fast) integer operations. The final answer can be converted back to floating point at the end, by dividing by the scaling factor.

This is often done if the target processor lacks a hardware floating-point unit.

sblair
+3  A: 

In addition to the accepted answer: If you need to sum up many small number and some larger ones, you should use Kahan Summation.

Adrian