views:

639

answers:

6

This is something that's been on my mind for years, but I never took the time to ask before.

Many (pseudo) random number generators generate a random number between 0.0 and 1.0. Mathematically there are infinite numbers in this range, but double is a floating point number, and therefore has a finite precision.

So the questions are:

  1. Just how many double numbers are there between 0.0 and 1.0?
  2. Are there just as many numbers between 1 and 2? Between 100 and 101? Between 10^100 and 10^100+1?

Note: if it makes a difference, I'm interested in Java's definition of double in particular.

+2  A: 
  1. 2^53 - the size of the significand/mantissa of a 64bit floating point number including the hidden bit.
  2. Roughly yes, as the sifnificand is fixed but the exponent changes.

See the wikipedia article for more information.

Yann Ramin
Your answer for 2 contradicts how I understand the working of FP.
polygenelubricants
I think `1` is wrong because the hidden bit is always one -- therefore, `2^52`, **not** `2^53` _distinct_ values (between adjacent powers of two, one included and the next one excluded -- **not** between 0.0 and 1.0!).
Alex Martelli
+3  A: 

The article Java's new math, Part 2: Floating-point numbers from IBM offers the following code snippet to solve this (in floats, but I suspect it works for doubles as well):

public class FloatCounter {

    public static void main(String[] args) {
        float x = 1.0F;
        int numFloats = 0;
        while (x <= 2.0) {
            numFloats++;
            System.out.println(x);
            x = Math.nextUp(x);
        }
        System.out.println(numFloats);
    }
}

They have this comment about it:

It turns out there are exactly 8,388,609 floats between 1.0 and 2.0 inclusive; large but hardly the uncountable infinity of real numbers that exist in this range. Successive numbers are about 0.0000001 apart. This distance is called an ULP for unit of least precision or unit in the last place.

Mark Rushakoff
Yep, but that's for `float`, _not_ `double` -- `float`s have 23 bits' worth of fraction, so `2**23 -> 8388608` different values between adjacent powers of two (the "inclusive" part of course mean you have to count one more, the next power of two). `double`s have 52-bit fractions!
Alex Martelli
@Alex: I guess I'll have to leave the program (modified for doubles) running until the end of the universe or so before I can get the results... :(
Mark Rushakoff
@Mark: I feel dumb; I just wrote the `double` equivalent and thought "Hey, I'll answer my own question in about 5 minutes..."
polygenelubricants
@polygene: This feels like a Project Euler problem where the obvious approach is infeasible to compute, but there must be some brilliantly simple formula to solve for the arbitrary case...
Mark Rushakoff
@Mark, maybe not with a truly supercharged supercomputer: on a machine taking just a nanosecond to run the inner loop, counting with `double` between adjacent powers of two would take about 52 days (the `println` of course would be _very_ unlikely to run that fast no matter what, so let's assume that one statement goes away;-).I think it's feasible to take a year or less on a powerful but realistic machine;-).
Alex Martelli
+26  A: 

Java doubles are in IEEE-754 format, therefore they have a 52-bit fraction; between any two adjacent powers of two (inclusive of one and exclusive of the next one), there will therefore be 2 to the 52th power different doubles (i.e., 4503599627370496 of them). For example, that's the number of distinct doubles between 0.5 included and 1.0 excluded, and exactly that many also lie between 1.0 included and 2.0 excluded, and so forth.

Counting the doubles between 0.0 and 1.0 is harder than doing so between powers of two, because there are many powers of two included in that range, and, also, one gets into the thorny issues of denormalized numbers. 10 of the 11 bits of the exponents cover the range in question, so, including denormalized numbers (and I think a few kinds of NaN) you'd have 1024 times the doubles as lay between powers of two -- no more than 2**62 in total anyway. Excluding denormalized &c, I believe the count would be 1023 times 2**52.

For an arbitrary range like "100 to 100.1" it's even harder because the upper bound cannot be exactly represented as a double (not being an exact multiple of any power of two). As a handy approximation, since the progression between powers of two is linear, you could say that said range is 0.1 / 64th of the span between the surrounding powers of two (64 and 128), so you'd expect about

(0.1 / 64) * 2**52

distinct doubles -- which comes to 7036874417766.4004... give or take one or two;-).

Alex Martelli
@Alex: just to note, when I wrote 100 to 100.1 I miswrote. I meant 100 to 101. Basically, between N and N+1 for arbitrary N.
polygenelubricants
@Alex: so let me get this straight: there can be no more than `2**64` possible double values (since it's a 64 bit type), and apparently a HUGE proportion of those values lie between `0..1`?
polygenelubricants
@polygene, yes and yes -- specifically, about one quarter of the possible values (for any "normal" floating point representation of any base and exponent vs fraction lengths) lie between 0.0 and 1.0 (another quarter between 1.0 and infinity, and the remaining half on the negative half of the real axis). Essentially, half the values of the exponent (with a normal bias, halfway within its range) represent negative powers of the base, therefore numbers < 1.0.
Alex Martelli
@polygene, between any `X` and `Y>X` (two numbers that lie between `2**N` and `2**(N+1)` for some `N`) just the same approximate computation applies: there will be around `(Y-X)/(2**N) * 2**52` distinct `double` numbers in that range.
Alex Martelli
@polygenelubricants: for many applications, the range between 0 and 1 is much, much more important and interesting than the range between 100 and 101, that's why it gets a bigger share of the values. For example, in physics, you often have to deal with ridiculously small values like Newton's graviational constant at 6.67e-11. Having good precision there is more useful than between 100 and 101. Read http://floating-point-gui.de/ for more information.
Michael Borgwardt
You can also scale any number to between 0.0 and 1.0, keeping track of the scale separately, yielding less error in computation. It's nice when the entire number line can be mapped between two numbers!
codekaizen
+2  A: 

The Java double is a IEEE 754 binary64 number.

This means that we need to consider:

  1. Mantissa is 52 bit
  2. Exponent is 11 bit number with 1023 bias (ie with 1023 added to it)
  3. If the exponent is all 0 and the mantissa is non zero then the number is said to be non-normalized

This basically means there is a total of 2^62-2^52+1 of possible double representations that according to the standard are between 0 and 1. Note that 2^52+1 is to the remove the cases of the non-normalized numbers.

Remember that if mantissa is positive but exponent is negative number is positive but less than 1 :-)

For other numbers it is a bit harder because the edge integer numbers may not representable in a precise manner in the IEEE 754 representation, and because there are other bits used in the exponent to be able represent the numbers, so the larger the number the lower the different values.

njsf
+11  A: 

Every double value whose representation is between 0x0000000000000000 and 0x3ff0000000000000 lies in the interval [0.0, 1.0]. That's (2^62 - 2^52) distinct values (plus or minus a couple depending on whether you count the endpoints).

The interval [1.0, 2.0] corresponds to representations between 0x3ff0000000000000 and 0x400000000000000; that's 2^52 distinct values.

The interval [100.0, 101.0] corresponds to representations between 0x4059000000000000 and 0x4059400000000000; that's 2^46 distinct values.

There are no doubles between 10^100 and 10^100 + 1. Neither one of those numbers is representable in double precision, and there are no doubles that fall between them. The closest two double precision numbers are:

99999999999999982163600188718701095...

and

10000000000000000159028911097599180...
Stephen Canon
+1, for a well-supported exact answer. (If you're picky about counting the endpoints, remember that +0.0 and -0.0 have distinct representations.)
Jim Lewis
+1, such a twist ending! Felt like I was reading an M. Night Shyamalan script!
polygenelubricants
+2  A: 

Others have already explained that there are around 2^62 doubles in the range [0.0, 1.0].
(Not really surprising: there are almost 2^64 distinct finite doubles; of those, half are positive, and roughly half of those are < 1.0.)

But you mention random number generators: note that a random number generator generating numbers between 0.0 and 1.0 cannot in general produce all these numbers; typically it'll only produce numbers of the form n/2^53 with n an integer (see e.g. the Java documentation for nextDouble). So there are usually only around 2^53 (+/-1, depending on which endpoints are included) possible values for the random() output. This means that most doubles in [0.0, 1.0] will never be generated.

Mark Dickinson