ansaurus

Question

How are float and doubles represented in C++ (gcc)?

Answer 1

+14 A:

Try this link: http://en.wikipedia.org/wiki/IEEE_754

I just found that this might be a little more helpful: http://en.wikipedia.org/wiki/IEEE_754-1985

This is the IEEE-Standard for floating point numbers. There is one from 1985 and a revised edition from 2008. Float is 32bit, double is 64bit (explained in the second link).

Edit: Thx to the comment by Don, here's the link to Intels 80bit floating point description: http://en.wikipedia.org/wiki/Extended_precision

Tobias Langner 2009-07-15 14:11:37

I'd add that this is the overwhelmingly common case, but that the choice is influenced by the hardware as well.

Don Wakefield 2009-07-15 14:17:01

Answer 2

+3 A:

It might also be worth noting that there is a static bool const member of std::numeric_limits, is_iec559, which is naturally only available for floating point types. The name is pretty self explanatory...

fow 2009-07-15 14:29:09

Self explanatory as long as you known that IEC 559 is the name of the renormalization by ISO of what is better known as IEEE 754.And you should also know that there could be false negative: that standard is more than a representation -- in fact, I'm not sure that the 1985 version mandates a representation -- but also behavior (rounding, denormals,...). Setting that member to true implies that you obey to all of that and that hard of some platform. As far as I know, first versions of Java mandated it and then reverted as the performance hit was too high.

AProgrammer 2009-07-15 15:08:53

Answer 3

A:

To actually interpret it you would probably not want to treat it as bytes anyway because mantisa boundries don't align to an 8bit boundry.

Something along the lines of:

mantisa =  (*(unsigned int *)&floatVal) | MANTISA_MASK;
exp     = ((*(unsigned int *)&floatVal) | EXP_MASK    ) >> EXP_SHIFT;
sign    = ((*(unsigned int *)&floatVal) | SIGN_MASK   ) >> SIGN_SHIFT;

Would let you pull it apart to play with the juice center.

EDIT:

    #include <stdio.h>

    void main()
    {
    float a = 4;
    unsigned int exp,sign,mantisa;
    int i;

        for(i = 0;i<4;i++)
        {
         exp      = (*((unsigned int *)&a) >>23) & 0xFF;
         sign     = (*((unsigned int *)&a) >>31) & 0x01;
         mantisa  = (*((unsigned int *)&a)) & 0x7FFFFF | 0x800000;

         printf("a       = %04x\r\n",*((unsigned int *)&a));
         printf("a       = %f\r\n",a);
         printf("exp     = %i, %02x\r\n",exp,exp);
         printf("sign    = %i, %02x\r\n",sign,sign);
         printf("mantisa = %i, %02x\r\n\r\n",mantisa,mantisa);
         a = -a / 2;

      }
    }

Produces:

    a       = 40800000
    a       = 4.000000
    exp     = 129, 81
    sign    = 0, 00
    mantisa = 8388608, 800000

    a       = c0000000
    a       = -2.000000
    exp     = 128, 80
    sign    = 1, 01
    mantisa = 8388608, 800000

    a       = 3f800000
    a       = 1.000000
    exp     = 127, 7f
    sign    = 0, 00
    mantisa = 8388608, 800000

    a       = bf000000
    a       = -0.500000
    exp     = 126, 7e
    sign    = 1, 01
    mantisa = 8388608, 800000

    Press any key to continue . . .

NoMoreZealots 2009-07-15 16:34:29

Yes, but this is undefined behavior (accessing a float value using an int pointer). You are guaranteed to be able to access the bytes of an object as an array of unsigned char (in C, at least - I'm not sure of the comparable wording in the C++ standard), but not as other data types, in general. In particular, the compiler's optimizer is entitled to assume that you haven't done this, and apply optimizations that can break your code if you have.

dewtell 2009-07-15 18:08:33

Also, you would need to take the address of floatVal before casting it to another pointer type anyhow. An example of what I was saying about the optimizer: if the compiler was holding the current value of floatVal in a register, the optimizer would be entitled to assume that it didn't need to spill the current value to memory before executing this code, since those int pointers can't legally be accessing the float value. So even if the undefined behavior didn't make your computer explode, you could easily be picking up random garbage rather than the latest value of floatVal with this code.

dewtell 2009-07-15 18:23:00

This works on both big and little endian.

NoMoreZealots 2009-07-15 19:10:43

The language specification is architecture independant, therefore it can't define the bit definition of a floating point number. i.e. TI had it's own floating point format on some of it's old DSPs. If you are specifying IEEE format then you are defining the bit definition of a floating point number.

NoMoreZealots 2009-07-15 20:18:32

One of the permissible implementations of undefined behavior is for it to "work," (i.e., do what the programmer expects) silently. It is also permitted to not work, arbitrarily. The compiler isn't usually going to go out of its way to sabotage your code, but it isn't required to spend any effort to make sure it keeps working as you crank up the optimization level, add other unrelated changes to the code, upgrade to a new compiler version, port to a new platform, or compile on alternate Tuesdays.

dewtell 2009-07-15 20:46:48

NoMoreZealots 2009-07-15 21:45:36

It's not just a question of different floating point formats. Accessing the float as a sequence of unsigned chars is just as subject to different formats, but that is implementation-defined behavior, not undefined (choice of behaviors, but the implementation has to document what they do). If the implementation chooses to also document that the access via int pointers works in the same way, they are free to do so, but the standard does not require them to do it.

dewtell 2009-07-15 22:23:02

NoMoreZealots 2009-07-16 17:23:58

ansaurus

tags:

views:

answers:

How are float and doubles represented in C++ (gcc)?

related questions