ansaurus

Question

Computing the ranges of floating point data types

Answer 1

+2 A:

You could try making a float bigger until it overflows.

Ben Alpert 2009-02-08 19:07:23

That's what I've been doing with integer types - but wouldn't it take quite some time to overflow a long double?

Ree 2009-02-08 19:51:53

If you do it by exponential search, it will overflow fast enough. You can even make the number grow faster (eg. exponential search of maximum exponent vs. exponential search of maximum value)

jpalecek 2009-02-23 00:07:47

Answer 2

+4 A:

For 99.99% of all applications, you should assume IEEE 754 and use the constants defined in <float.h>. In the other 0.01%, you'll be working with very specialized hardware, and in that case, you should know what to use based on the hardware.

Adam Rosenfield 2009-02-08 19:08:11

Answer 3

+5 A:

It's possible (at least for IEEE 754 float and double values) to compute the greatest floating-point value via (pseudo-code):

~(-1.0) | 0.5

Before we can do our bit-twiddling, we'll have to convert the floating-point values to integers and then back again. This can be done in the following way:

uint64_t m_one, half;
double max;

*(double *)(void *)&m_one = -1.0;
*(double *)(void *)&half = 0.5;
*(uint64_t *)(void *)&max = ~m_one | half;

So how does it work? For that, we have to know how the floating-point values will be encoded.

The highest bit encodes the sign, the next k bits encode the exponent and the lowest bits will hold the fractional part. For powers of 2, the fractional part is 0.

The exponent will be stored with a bias (offset) of 2**(k-1) - 1, which means an exponent of 0 corresponds to a pattern with all but the highest bit set.

There are two exponent bit patterns with special meaning:

if no bit is set, the value will be 0 if the fractional part is zero; otherwise, the value is a subnormal
if all bits are set, the value is either infinity or NaN

This means the greatest regular exponent will be encoded via setting all bits except the lowest one, which corresponds to a value of 2**k - 2 or 2**(k-1) - 1 if you substract the bias.

For double values, k = 11, ie the highest exponent will be 1023, so the greatest floating point value is of order 2**1023 which is about 1E+308.

The greatest value will have

the sign bit set to 0
all but the lowest exponent bit set to 1
all fractional bits set to 1

Now, it's possible to understand how our magic numbers work:

-1.0 has its sign bit set, the exponent is the bias - ie all bits but the highest one are present - and the fractional part is 0
~(-1.0) has only the highest exponent bit and all fractional bits set
0.5 has a sign bit and fractional part of 0; the exponent will be the bias minus 1, ie all but the highest and lowest exponent bits will be present

When we combine these two values via logical or, we'll get the bit pattern we wanted.

The computation works for x86 80-bit extended precision values (aka long double) as well, but the bit-twiddling must be done byte-wise as there's no integer type large enough to hold the values on 32-bit hardware.

The bias isn't actually required to be 2**(k-1) - 1 - it'll work for an arbitrary bias as long as it is odd. The bias must be odd because otherwise the bit-patterns for the exponent of 1.0 and 0.5 will differ in other places than the lowest bit.

If the base b (aka radix) of the floating-point type is not 2, you have to use b**(-1) instead of 0.5 = 2**(-1).

If the greatest exponent value is not reserverd, use 1.0 instead of 0.5. This will work regardless of base or bias (meaning it's no longer restricted to odd values). The difference in using 1.0 is that the lowest exponent bit won't be cleared.

To summarize:

~(-1.0) | 0.5

works as long as the radix is 2, the bias is odd and the highest exponent is reserved.

~(-1.0) | 1.0

works for any radix or bias as long as the highest exponent is not reserved.

Christoph 2009-02-08 22:30:40

Saying "for IEEE 754" denies the premise of the question.

Jonathan Leffler 2009-02-09 05:50:48

@Jonathan: at 'least for' wasn'tmeant to mean 'only for' - this works for every floating-point value with a bias of `2**(k-1) - 1` and a reserved exponent of `2**k - 1`, which includes half and quadruple precision values as well as 80-bit extended precision values on x86

Christoph 2009-02-09 09:53:50

@Jonathan: How else do you want to 'compute' something if you don't restrict the possible encoding schemes? There just isn't an algorithm which will work for all encodings!

Christoph 2009-02-09 09:56:43

@Jonathan: it actually works for any odd bias

Christoph 2009-02-09 10:45:37

Answer 4

+1 A:

At the risk of a superfluous answer:

No. There isn't a portable way to calculate the ranges. That's why the <float.h> header is provided - because there isn't a portable way to derive the information contained in it.

Jonathan Leffler 2009-02-09 05:51:12

That's true - but there are calculations which will work for a wide range of encoding schemes!

Christoph 2009-02-09 11:00:48

ansaurus

tags:

views:

answers:

Computing the ranges of floating point data types

related questions