Floating Point Arithmetic - Modulo Operator on Double Type

views:

187

answers:

+1 Q:

Floating Point Arithmetic - Modulo Operator on Double Type

So I'm trying to figure out why the modulo operator is returning such a large unusual value.

If I have the code:

double result = 1.0d % 0.1d;

it will give a result of 0.09999999999999995. I would expect a value of 0

Note this problem doesn't exist using the dividing operator - double result = 1.0d / 0.1d;

will give a result of 10.0, meaning that the remainder should be 0.

Let me be clear: I'm not surprised that an error exists, I'm surprised that the error is so darn large compared to the numbers at play. 0.0999 ~= 0.1 and 0.1 is on the same order of magnitude as 0.1d and only one order of magnitude away from 1.0d. Its not like you can compare it to a double.epsilon, or say "its equal if its < 0.00001 difference".

I've read up on this topic on StackOverflow, in the following posts one two three, amongst others.

Can anyone suggest explain why this error is so large? Any any suggestions to avoid running into the problems in the future (I know I could use decimal instead but I'm concerned about the performance of that).

Edit: I should specifically point out that I know that 0.1 is an infinitely repeating series of numbers in binary - does that have anything to do with it?

+1 A:

It's not precisely an "error" in the calculation but the fact that you never really had 0.1 to start with.

The problem is that 1.0 can be represented exactly in binary floating point but 0.1 cannot, because it can't be constructed exactly from negative powers of two. (It's 1/16 + 1/32 + ...)

So you aren't really getting 1.0 % 0.1, the machine is left to compute 1.0 % 0.1 +- 0.00... and then it reports honestly what it got as a result...

In order to have a large remainder, I suppose the second operand of % must have been slightly over 0.1, preventing the final division, and resulting in almost the entire 0.1 being the result of the operation.

DigitalRoss 2010-05-27 21:55:16

But why would 1.0 % 0.1 show this error and 1.0 / 0.1 not show the error?

CrimsonX 2010-05-27 22:01:25

Also I think you mean "the machine is left to compute 1.0 % 0.0999..."

CrimsonX 2010-05-27 22:14:26

The reason 1.0 / 0.1 comes out "right" is because 1.0 / .0999... and 1.0 / 0.1 are almost equal -- the function is continuous. The calculation is done with a few extra bits, the final representable result is the same in both cases. It then looks "right" based on our expecting the answer to 1.0 / exactly_0.1 to be 10. But remember that the problem presented to the machine was not 1.0 % 0.1 but 1.0 % 0.10000000001 or something and this calculation really does result in a large remainder that can't be rounded away.

DigitalRoss 2010-06-01 23:49:12

+1 A:

The fact that 0.1 cannot be represented exactly in binary has everything to do with it.

If 0.1 could be represented as a double, you would be getting the representable double nearest (assuming "nearest" rounding mode) to the actual result of the operation you want to compute.

Since it can't, you are getting the representable double nearest to an operation that is entirely different from the one you were trying to compute.

Also note that / is a mostly continuous function (a small difference on the arguments generally means a small difference on the result, and while the derivative can be large near but on a same side of zero, at least additional precision for the arguments helps). On the other hand % is not continuous: whatever the precision you choose, there will always be arguments for which an arbitrarily small representation error on the first argument means a large error on the result.

The way IEEE 754 is specified, you only get guarantees on the approximation of the result of one floating-point operation assuming the arguments are exactly what you want. If the arguments are not exactly what you want, you need to switch to other solutions, such as interval arithmetic or analysis of the well-conditionedness of your program (if it uses % on floating-point numbers, it is likely not to be well-conditioned).

Pascal Cuoq 2010-05-27 22:14:41

+3 A:

The error comes about because a double can't exactly represent 0.1 -- the closest it can represent is something like 0.100000000000000005551115123126. Now when you divide 1.0 by that it gives you a number slightly less than 10, but again a double can't exactly represent it, so it ends up rounding up to 10. But when you do the mod, it can give you that slightly less than 0.1 remainder.

since 0 = 0.1 mod 0.1, the actual error in the mod is 0.1 - 0.09999999... -- very small.

If you add the result of the % operator to 9 * 0.1, it will give you 1.0 again.

Edit

A bit more detail on the rounding stuff -- particularly as this problem is a good example of the perils of mixed precision.

The way a % b for floating point numbers is usually computed is as a - (b * floor(a/b)). The problem is that it may be done all at once with more internal precision than you'd get with those operations (and rounding the result to a fp number at each stage), so it might give you a different result. One example that a lot of people see is with the Intel x86/x87 hardware is using 80-bit precision for intermediate computations and only 64-bit precision for values in memory. So the value in b in the equation above is coming from memory and is thus a 64-bit fp number that's not quite 0.1 (thank dan04 for the exact value), so when it computes 1.0/0.1 it gets 9.99999999999999944488848768742172978818416595458984375 (rounded to 80 bits). Now if you round that to 64 bits, it would be 10.0, but if you keep the 80 bit internal and do the floor on it, it truncates to 9.0 and thus gets .0999999999999999500399638918679556809365749359130859375 as the final answer.

So in this case, you're seeing a large apparent error because you're using a noncontinuous step function (floor) which means that a very tiny difference in an internal value can push you over the step. But since mod is itself a noncontinuous step function thats to be expected and the real error here is 0.1-0.0999... as 0.1 is the discontinuous point in the range of the mod function.

Chris Dodd 2010-05-27 22:34:02

0.1000000000000000055511151231257827021181583404541015625, to be exact.

dan04 2010-05-28 01:18:54

I appreciate the walk-thru of the calculation. One clarification - you mention "since 0 = 0.1 mod 0.1, the actual error in the mod is 0.1 - 0.09999999... -- very small." I'm confused by this because my result returned was 0.099999, but the actual result is 0. How is the actual error 0.1 - 0.099999? Can you break down the error calculation of this part in a bit more detail?

CrimsonX 2010-06-01 14:18:24

ansaurus

tags:

views:

answers:

Floating Point Arithmetic - Modulo Operator on Double Type

related questions