It's possible (at least for IEEE 754 float
and double
values) to compute the greatest floating-point value via (pseudo-code):
~(-1.0) | 0.5
Before we can do our bit-twiddling, we'll have to convert the floating-point values to integers and then back again. This can be done in the following way:
uint64_t m_one, half;
double max;
*(double *)(void *)&m_one = -1.0;
*(double *)(void *)&half = 0.5;
*(uint64_t *)(void *)&max = ~m_one | half;
So how does it work? For that, we have to know how the floating-point values will be encoded.
The highest bit encodes the sign, the next k
bits encode the exponent and the lowest bits will hold the fractional part. For powers of 2
, the fractional part is 0
.
The exponent will be stored with a bias (offset) of 2**(k-1) - 1
, which means an exponent of 0
corresponds to a pattern with all but the highest bit set.
There are two exponent bit patterns with special meaning:
- if no bit is set, the value will be
0
if the fractional part is zero; otherwise, the value is a subnormal
- if all bits are set, the value is either
infinity
or NaN
This means the greatest regular exponent will be encoded via setting all bits except the lowest one, which corresponds to a value of 2**k - 2
or 2**(k-1) - 1
if you substract the bias.
For double
values, k = 11
, ie the highest exponent will be 1023
, so the greatest floating point value is of order 2**1023
which is about 1E+308
.
The greatest value will have
- the sign bit set to
0
- all but the lowest exponent bit set to
1
- all fractional bits set to
1
Now, it's possible to understand how our magic numbers work:
-1.0
has its sign bit set, the exponent is the bias - ie all bits but the highest one are present - and the fractional part is 0
~(-1.0)
has only the highest exponent bit and all fractional bits set
0.5
has a sign bit and fractional part of 0
; the exponent will be the bias minus 1
, ie all but the highest and lowest exponent bits will be present
When we combine these two values via logical or, we'll get the bit pattern we wanted.
The computation works for x86 80-bit extended precision values (aka long double
) as well, but the bit-twiddling must be done byte-wise as there's no integer type large enough to hold the values on 32-bit hardware.
The bias isn't actually required to be 2**(k-1) - 1
- it'll work for an arbitrary bias as long as it is odd. The bias must be odd because otherwise the bit-patterns for the exponent of 1.0
and 0.5
will differ in other places than the lowest bit.
If the base b
(aka radix) of the floating-point type is not 2
, you have to use b**(-1)
instead of 0.5 = 2**(-1)
.
If the greatest exponent value is not reserverd, use 1.0
instead of 0.5
. This will work regardless of base or bias (meaning it's no longer restricted to odd values). The difference in using 1.0
is that the lowest exponent bit won't be cleared.
To summarize:
~(-1.0) | 0.5
works as long as the radix is 2
, the bias is odd and the highest exponent is reserved.
~(-1.0) | 1.0
works for any radix or bias as long as the highest exponent is not reserved.