ansaurus

Question

Float32 to Float16

Answer 1

+3 A:

The exponents in your float32 and float16 representations are probably biased, and biased differently. You need to unbias the exponent you got from the float32 representation to get the actual exponent, and then to bias it for the float16 representation.

Apart from this detail, I do think it's as simple as that, but I still get surprised by floating-point representations from time to time.

EDIT:

Check for overflow when doing the thing with the exponents while you're at it.
Your algorithm truncates the last bits of the mantisa a little abruptly, that may be acceptable but you may want to implement, say, round-to-nearest by looking at the bits that are about to be discarded. "0..." -> round down, "100..001..." -> round up, "100..00" -> round to even.

Pascal Cuoq 2010-06-11 21:53:12

32 bit floating point numbers in the IEEE754 Standard have 23 bits of mantissa and 8 bits exponent.

bbudge 2010-06-11 21:57:43

@bbudge ... fair enough I was trying to do it from memory. I took the wrong bit away, evidently ;)

Goz 2010-06-11 22:02:43

Answer 2

+2 A:

Here's the link to an article on IEEE754, which gives the bit layouts and biases.

http://en.wikipedia.org/wiki/IEEE_754-2008

bbudge 2010-06-11 21:58:18

Answer 3

+1 A:

You can't shift the exponent down like you're doing. This will (for example) convert a*2^16 to a*2^4, which is a completely different value. Instead, you must keep the same value if it's within the new range, or saturate to zero or infinity otherwise.

Mike Seymour 2010-06-11 22:14:13

ansaurus

tags:

views:

answers:

Float32 to Float16

related questions