I was wondering about how bits are organized on floats (4 bytes), double (8 bytes) and half floats (2 bytes, used on OpenGL implementation).
Further, how I could convert from one to another?
I was wondering about how bits are organized on floats (4 bytes), double (8 bytes) and half floats (2 bytes, used on OpenGL implementation).
Further, how I could convert from one to another?
In essence for each of these formats, you have:
If the sign bit is 1, the number is negative, else it is positive.
To get the magnitude, you take (1 + M) * 2^(E - k), where k (called the "exponent bias") depends on the format.
It's worth noting that certain combinations of sign, exponent, and mantissa are "special" values, like 0, -inf
, +inf
, and NaN
.
For the specifics (values of x, y, and k) see Wikipedia for single precision (4 bytes), double precision (8 bytes), and half precision (2 bytes).
Note that these are all specified by IEEE 754, so googling that might give you helpful results. :)