views:

660

answers:

3

This is probably a question for an x86 FPU expert:

I am trying to write a function which generates a random floating point value in the range [min,max]. The problem is that my generator algorithm (the floating-point Mersenne Twister, if you're curious) only returns values in the range [1,2) - ie, I want an inclusive upper bound, but my "source" generated value is from an exclusive upper bound. The catch here is that the underlying generator returns an 8-byte double, but I only want a 4-byte float, and I am using the default FPU rounding mode of Nearest.

What I want to know is whether the truncation itself in this case will result in my return value being inclusive of max when the FPU internal 80-bit value is sufficiently close, or whether I should increment the significand of my max value before multiplying it by the intermediary random in [1,2), or whether I should change FPU modes. Or any other ideas, of course.

Here's the code I am currently using, and I did verify that 1.0f resolves to 0x3f800000:

float MersenneFloat( float min, float max )
{
    //genrand returns a double in [1,2)
    const float random = (float)genrand_close1_open2(); 
    //return in desired range
    return min + ( random - 1.0f ) * (max - min);
}

If it makes a difference, this needs to work on both Win32 MSVC++ and Linux gcc. Also, will using any versions of the SSE optimizations change the answer to this?

Edit: The answer is yes, truncation in this case from double to float is sufficient to cause the result to be inclusive of max. See Crashworks' answer for more.

A: 

If you do adjust the rounding so that does include both ends of the range, will those extreme values not be only half as likely as any of the non-extreme ones?

Pete Kirkham
It seems to me if I just use truncation, the answer is yes, but if I increment the max significand, the answer would be no.
Not Sure
A: 

With truncation, you are never going to be inclusive of the max.

Are you sure you really need the max? There is literally an almost 0 chance that you will land on exactly the maximum.

That said, you can exploit the fact that you are giving up precision and do something like this:

float MersenneFloat( float min, float max )
{
    double random = 100000.0; // just a dummy value
    while ((float)random > 65535.0)
    {
        //genrand returns a double in [1,2)
        double random = genrand_close1_open2() - 1.0; // now it's [0,1)
        random *= 65536.0; // now it's [0,65536). We try again if it's > 65535.0
    }
    //return in desired range
    return min + float(random/65535.0) * (max - min);
}

Note that, now, it has a slight chance of multiple calls to genrand each time you call MersenneFloat. So you have given up possible performance for a closed interval. Since you are downcasting from double to float, you end up sacrificing no precision.

Edit: improved algorithm

rlbond
Yes, I need the max to be inclusive (it's a library function contract). Would there be any advantage to doing it your way, as opposed to incrementing the significand of my max value before the multiplication?
Not Sure
That may work as well. Somewhere, however, you are either going to need to do a rejection test, or have a not-perfect distribution of values.An analogue of this problem is, say, generating an integer 0-256 from a random int 0-65535. It just doesn't map evenly.
rlbond
Actually, I just tried Crashworks test suggestion, and the truncation does in fact round up.
Not Sure
+4  A: 

The SSE ops will subtly change the behavior of this algorithm because they don't have the intermediate 80-bit representation -- the math truly is done in 32 or 64 bits. The good news is that you can easily test it and see if it changes your results by simply specifying the /ARCH:SSE2 command line option to MSVC, which will cause it to use the SSE scalar ops instead of x87 FPU instructions for ordinary floating point math.

I'm not sure offhand of what the exact rounding behavior is around the integer boundaries, but you can test to see what'll happen when 1.999.. gets rounded from 64 to 32 bits by eg

static uint64 OnePointNineRepeating = 0x3FF FFFFF FFFF FFFF // exponent 0 (biased to 1023), all 1 bits in mantissa
double asDouble = *(double *)(&OnePointNineRepeating);
float asFloat = asDouble;
return asFloat;

Edit, result: original poster ran this test and found that with truncation, the 1.99999 will round up to 2 both with and without /arch:SSE2.

Crashworks
Now why didn't I think of running that test among the others I ran :) I did discover that with truncation, the 1.99999 will round up to 2 both with and without /arch:SSE2. Thanks!
Not Sure
Glad to help -- I was curious what the result of the test would be myself.
Crashworks