views:

430

answers:

3

In C89, floor() returns a double. Is the following guaranteed to work?

double d = floor(3.0 + 0.5);
int x = (int) d;
assert(x == 3);

My concern is that the result of floor might not be exactly representable in IEEE 754. So d gets something like 2.99999, and x ends up being 2.

For the answer to this question to be yes, all integers within the range of an int have to be exactly representable as doubles, and floor must always return that exactly represented value.

+13  A: 

All integers can have exact floating point representation if your floating point type supports the required mantissa bits. Since double uses 53 bits for mantissa, it can store all 32-bit ints exactly. After all, you could just set the value as mantissa with zero exponent.

Mehrdad Afshari
The mantissa with a zero exponent doesn't encompass any number beyond 2^(exp-bias). I think you meant you can just zero out the bits that contribute to the fractional part of the number which is guaranteed to be exactly representable if the exponent is <= 52 or so.MSNMSN
MSN
MSN: I was speaking theoretically for the sake of simplicity. From exponent, I meant unbiased one, not the actual thing stored in exponent bits. Basically, if I were talking about the actual bits, double would have 52 bits mantissa, not 53 ;)
Mehrdad Afshari
+2  A: 

If the result of floor() isn't exactly representable, what do you expect the value of d to be? Surely if you've got the representation of a floating point number in a variable, then by definition it's exactly representable isn't it? You've got the representation in d...

(In addition, Mehrdad's answer is correct for 32 bit ints. In a compiler with a 64 bit double and a 64 bit int, you've got more problems of course...)

EDIT: Perhaps you meant "the theoretical result of floor(), i.e. the largest integer value less than or equal to the argument, may not be representable as an int". That's certainly true. Simple way of showing this for a system where int is 32 bits:

int max = 0x7fffffff;
double number = max;
number += 10.0;
double f = floor(number);
int oops = (int) f;

I can't remember offhand what C does when conversions from floating point to integer overflow... but it's going to happen here.

EDIT: There are other interesting situations to consider too. Here's some C# code and results - I'd imagine at least similar things would happen in C. In C#, double is defined to be 64 bits and so is long.

using System;
class Test
{
    static void Main()
    {
        FloorSameInteger(long.MaxValue/2);
        FloorSameInteger(long.MaxValue-2);
    }

    static void FloorSameInteger(long original)
    {
        double convertedToDouble = original;
        double flooredToDouble = Math.Floor(convertedToDouble);
        long flooredToLong = (long) flooredToDouble;

        Console.WriteLine("Original value: {0}", original);
        Console.WriteLine("Converted to double: {0}", convertedToDouble);
        Console.WriteLine("Floored (as double): {0}", flooredToDouble);
        Console.WriteLine("Converted back to long: {0}", flooredToLong);
        Console.WriteLine();
    }
}

Results:

Original value: 4611686018427387903
Converted to double: 4.61168601842739E+18
Floored (as double): 4.61168601842739E+18
Converted back to long: 4611686018427387904

Original value: 9223372036854775805
Converted to double: 9.22337203685478E+18
Floored (as double): 9.22337203685478E+18
Converted back to long: -9223372036854775808

In other words:

(long) floor((double) original)

isn't always the same as original. This shouldn't come as any surprise - there are more long values than doubles (given the NaN values) and plenty of doubles aren't integers, so we can't expect every long to be exactly representable. However, all 32 bit integers are representable as doubles.

Jon Skeet
I was trying to ask whether it was possible for floor to return a floating point value that is very close to, but not exactly, an integer. I know that some values, like 0.1, cannot be represented exactly. If floor internally hit one of these, then its result might not be an exact integer.
Jim Hunziker
Okay - editing...
Jon Skeet
Jon, thanks for the point. I edited my answer to mention 32 bit explicitly. Since I'm mostly a C# guy, 32 bit ints are the way my mind works ;)
Mehrdad Afshari
A: 

I think you're a bit confused about what you want to ask. floor(3 + 0.5) is not a very good example, because 3, 0.5, and their sum are all exactly representable in any real-world floating point format. floor(0.1 + 0.9) would be a better example, and the real question here is not whether the result of floor is exactly representable, but whether inexactness of the numbers prior to calling floor will result in a return value different from what you would expect, had all numbers been exact. In this case, I believe the answer is yes, but it depends a lot on your particular numbers.

I invite others to criticize this approach if it's bad, but one possible workaround might be to multiply your number by (1.0+0x1p-52) or something similar prior to calling floor (perhaps using nextafter would be better). This could compensate for cases where an error in the last binary place of the number causes it to fall just below rather than exactly on an integer value, but it will not account for errors which have accumulated over a number of operations. If you need that level of numeric stability/exactness, you need to either do some deep analysis or use an arbitrary-precision or exact-math library which can handle your numbers correctly.

R..
No, I think people interpreted my question correctly. I wanted to know if the (integer in a double) result of floor was guaranteed to cast as the integer that it's supposed to represent. I understand that floating point math can be inexact, and if the result of an operation inside the parentheses is slightly different from what is expected, floor could surprise you.
Jim Hunziker