views:

125

answers:

3
+4  Q: 

Float32 to Float16

Can someone explain to me how I convert a 32-bit floating point value to a 16-bit floating point value?

(s = sign e = exponent and m = mantissa)

If 32-bit float is 1s7e24m
And 16-bit float is 1s5e10m

Then is it as simple as doing?

int     fltInt32;
short   fltInt16;
memcpy( &fltInt32, &flt, sizeof( float ) );

fltInt16 = (fltInt32 & 0x00FFFFFF) >> 14;
fltInt16 |= ((fltInt32 & 0x7f000000) >> 26) << 10;
fltInt16 |= ((fltInt32 & 0x80000000) >> 16);

I'm assuming it ISN'T that simple ... so can anyone tell me what you DO need to do?

Edit: I cam see I've got my exponent shift wrong ... so would THIS be better?

fltInt16 =  (fltInt32 & 0x007FFFFF) >> 13;
fltInt16 |= (fltInt32 & 0x7c000000) >> 13;
fltInt16 |= (fltInt32 & 0x80000000) >> 16;

I'm hoping this is correct. Apologies if I'm missing something obvious that has been said. Its almost midnight on a friday night ... so I'm not "entirely" sober ;)

Edit 2: Ooops. Buggered it again. I want to lose the top 3 bits not the lower! So how about this:

fltInt16 =  (fltInt32 & 0x007FFFFF) >> 13;
fltInt16 |= (fltInt32 & 0x0f800000) >> 13;
fltInt16 |= (fltInt32 & 0x80000000) >> 16;
+3  A: 

The exponents in your float32 and float16 representations are probably biased, and biased differently. You need to unbias the exponent you got from the float32 representation to get the actual exponent, and then to bias it for the float16 representation.

Apart from this detail, I do think it's as simple as that, but I still get surprised by floating-point representations from time to time.

EDIT:

  1. Check for overflow when doing the thing with the exponents while you're at it.

  2. Your algorithm truncates the last bits of the mantisa a little abruptly, that may be acceptable but you may want to implement, say, round-to-nearest by looking at the bits that are about to be discarded. "0..." -> round down, "100..001..." -> round up, "100..00" -> round to even.

Pascal Cuoq
32 bit floating point numbers in the IEEE754 Standard have 23 bits of mantissa and 8 bits exponent.
bbudge
@bbudge ... fair enough I was trying to do it from memory. I took the wrong bit away, evidently ;)
Goz
+2  A: 

Here's the link to an article on IEEE754, which gives the bit layouts and biases.

http://en.wikipedia.org/wiki/IEEE_754-2008

bbudge
+1  A: 

You can't shift the exponent down like you're doing. This will (for example) convert a*2^16 to a*2^4, which is a completely different value. Instead, you must keep the same value if it's within the new range, or saturate to zero or infinity otherwise.

Mike Seymour