ansaurus

Question

Can every float be expressed exactly as a double?

Answer 1

+5 A:

In theory, there is not such a value, so "yes", every float should be representable as a double.. Converting from a float to a double should involve just tacking four bytes of 00 on the end -- they are stored using the same format, just with different sized fields.

James Curran 2008-11-03 15:42:59

please clarify the 'no' - the two questions in the original post are contradictory, so I can't tell which you're answering.

Alnitak 2008-11-03 15:46:11

Good point, thanks. Fixed.

James Curran 2008-11-03 15:51:17

Inserting 32 0 bits, yes. Inserting them all at the end, no. Some are added to the mantissa, some to the exponent.

Steve Jessop 2008-11-03 18:20:53

Actually, I tell a lie, the exponent of course isn't 0-extended, because the bias is different in a double. So converting involves a small amount of actual arithmetic.

Steve Jessop 2008-11-03 18:26:55

This answer is almost correct - the double type is not just a simple extension in 32 bits of the float type. The exponent field is actually extended from 8 to 11 bits, so what's left for extending the mantissa field is just 29 bits, not 32. (Yes, I noticed the discussion is a little old, but for the sake of future generations...)

ysap 2010-03-19 19:32:42

Answer 2

+3 A:

Yes, floats are a subset of doubles. Both floats and doubles have the form (sign * a * 2^b). The difference between floats and doubles is the number of bits in a & b. Since doubles have more bits available, assigning a float value to a double effectively means inserting extra 0 bits.

MSalters 2008-11-03 15:43:01

Answer 3

+2 A:

As everyone has already said, "no". But that's actually a "yes" to the question itself, i.e. every float can be exactly expressed as a double. Confusing. :)

unwind 2008-11-03 15:45:49

Thanks, I've cleared it up

Kip 2008-11-03 15:53:35

Answer 4

+2 A:

If I'm reading the language specification correctly (and as everyone else is confirming), there is no such value.

That is, each claims only to hold only IEEE 754 standard values, so casts between the two should incur no change except in memory given.

(clarification: There would be no change as long as the value was small enough to be held in a float; obviously if the value was too many bits to be held in a float to begin with, casting from double to float would result in a loss of precision.)

Mitch Flax 2008-11-03 15:50:39

@Mitch: casting from Double to Float is guaranteed to loose precision. You can do float -> double -> float and get the same answer back. But if you have a double value that's the result of some calculation, it can't be cast to Float without having bits discarded.

S.Lott 2008-11-03 15:59:01

Thanks, good call - so edited.

Mitch Flax 2008-11-03 16:21:28

Answer 5

A:

Snark: NaNs will compare differently after (or indeed before) conversion.

This does not, however, invalidate the answers already given.

dmckee 2008-11-03 16:10:20

Answer 6

A:

I took the code you listed and decided to try it in C++ since I thought it might execute a little faster and it is significantly easier to do unsafe casting. :-D

I found out that for valid numbers, the conversion works and you get the exact bitwise representation after the cast. However, for non-numbers, e.g. 1.#QNAN0, etc., the result will use a simplified representation of the non-number rather than the exact bits of the source. For example:

** FAILURE ** 2140188725 | 1.#QNAN0 -- 0xa0000000 0x7ffa1606

I cast an unsigned int to float then to double and back to float. The number 2140188725 (0x7F90B035) results in a NAN and converting to double and back is still a NAN but not the exact same NAN.

Here is the simple C++ code:

typedef unsigned int uint;
for (uint i = 0; i < 0xFFFFFFFF; ++i)
{
    float f1 = *(float *)&i;
    double d = f1;
    float f2 = (float)d;
    if(f1 != f2)
        printf("**** FAILURE **** %u | %f -- 0x%08x 0x%08x\n", i, f1, f1, f2);
    if ((i % 1000000) == 0)
        printf("Iteration: %d\n", i);
}

Ryan 2008-11-03 16:15:19

Recall that any NaN compares as not equal to _everything_. Thus you can detect NaN with code like "if (a != a){/* have NaN */ }".

dmckee 2008-11-03 16:19:35

"I cast an unsigned int to float" - technically you didn't, you cast an int* to float* and dereferenced it. Casting an int to float performs a numeric conversion.

Steve Jessop 2008-11-03 18:24:58

Answer 7

+1 A:

@KenG: This code:-

float a = 0.1F
println "a=${a}"
double d = a
println "d=${d}"

fails not because 0.1f can't be exactly represented. The question was "is there a float value that cannot be represented as a double", which this code doesn't prove. Although 0.1f can't be stored exactly, the value that a is given (which isn't 0.1f exactly) can be stored as a double (which also won't be 0.1f exactly). Assuming an Intel FPU, the bit pattern for a is:

0 01111011 10011001100110011001101

and the bit pattern for d is:

0 01111111011 100110011001100110011010 (followed by lots more zeros)

which has the same sign, exponent (-4 in both cases) and the same fractional part (separated by spaces above). The difference in the output is due to the position of the second non-zero digit in the number (the first is the 1 after the point) which can only be represented with a double. The code that outputs the string format stores intermediate values in memory and is specific to floats and doubles (i.e. there is a function double-to-string and another float-to-string). If the to-string function was optimised to use the FPU stack to store the intermediate results of the to-string process, the output would be the same for float and double since the FPU uses the same, larger format (80bits) for both float and double.

There are no float values that can't be stored identically in a double, i.e. the set of float values is a sub-set of the the set of double values.

Skizz

Skizz 2008-11-03 17:28:31

Answer 8

+10 A:

Yes.

Proof by enumeration of all possible cases:

public class TestDoubleFloat  {
    public static void main(String[] args) {
        for (long i = Integer.MIN_VALUE; i <= Integer.MAX_VALUE; i++) {
            float f1 = Float.intBitsToFloat((int) i);
            double d = (double) f1;
            float f2 = (float) d;
            if (f1 != f2) {
                if (Float.isNaN(f1) && Float.isNaN(f2)) {
                    continue; // ok, NaN
                }
                fail("oops: " + f1 + " != " + f2);
            }
        }
    }
}

finishes in 12 seconds on my machine. 32 bits are small.

mfx 2008-11-03 18:27:44

This doesn't actually test all numbers representable by Floats; Floats cannot exactly represent integers above 2^23 or so.

MSN 2009-01-22 19:30:25

It enumerates all possible floats by enumerating all ints (which have the same size) and converting their bit patterns to float.

mfx 2009-01-22 22:19:27

Answer 9

A:

The answer to the first question is yes, the answer to the 'in other words', however is no. If you change the test in the code to be if (!(f1 != f2)) the answer to the second question becomes yes -- it will print 'Success' for all float values.

Chris Dodd 2008-11-04 00:36:42

Answer 10

A:

In theory every normal single can have the exponent and mantissa padded to create a double and then remove the padding and you return to the original single.

When you go from theory to reality is when you will have problems. I dont know if you were interested in theory or implementation. If it is implementation then you can rapidly get into trouble.

IEEE is a horrible format, my understanding it was intentionally designed to be so tough that nobody could meet it and allow the market to catch up to intel (this was a while back) allowing for more competition. If that is true it failed, either way we are stuck with this dreadful spec. Something like the TI format is far superior for the real world in so many ways. I have no connection to either company or any of these formats.

Thanks to this spec there are very few if any fpus that actually meet it (in hardware or even in hardware plus the operating system), and those that do often fail on the next generation. (google: TestFloat). The problems these days tend to lie in the int to float and float to int and not single to double and double to single as you have specified above. Of course what operation is the fpu going to perform to do that conversion? Add 0? Multiply by 1? Depends on the fpu and the compiler.

The problem with IEEE related to your question above is that there is more than one way a number, not every number but many numbers can be represented. If I wanted to break your code I would start with minus zero in the hope that one of the two operations would convert it to a plus zero. Then I would try denormals. And it should fail with a signaling nan, but you called that out as a known exception.

The problem is that equal sign, here is rule number one about floating point, never use an equal sign. Equals is a bit comparison not a value comparison, if you have two values represented in different ways (plus zero and minus zero for example) the bit comparison will fail even though its the same number. Greater than and less than are done in the fpu, equals is done with the integer alu.

I realize that you probably used the equal to explain the problem and not necessarily the code you wanted to succeed or fail.

dwelch 2008-11-04 15:48:13

== is not a bit comparison. In C, C++, C#, Java, Javascript, etc., 0 == negative 0. The Double.equals() method does a bit comparison.

A. Rex 2009-01-13 16:52:36

-1 for anti-IEEE-float propaganda with misleading and outright incorrect information.

R.. 2010-08-01 20:03:20

@R. I tell you what go and build a few fpu's from scratch that pass TestFloat level 3 and that meet the IEEE spec in hardware without software kludges (like most of the ones on the market) and then lets chat about how to build a better fpu with a better spec.

dwelch 2010-08-02 02:35:39

ansaurus

tags:

views:

answers:

Can every float be expressed exactly as a double?

related questions