views:

588

answers:

3

Given the start and the end of an integer range, how do I calculate a normally distributed random integer between this range?

I realize that the normal distribution goes into -+ infinity. I guess the tails can be cutoff, so when a random gets computed outside the range, recompute. This elevates the probability of integers in the range, but as long as the this effect is tolerable (<5%), it's fine.

public class Gaussian
{
    private static bool uselast = true;
    private static double next_gaussian = 0.0;
    private static Random random = new Random();

    public static double BoxMuller()
    {
        if (uselast) 
        { 
            uselast = false;
            return next_gaussian;
        }
        else
        {
            double v1, v2, s;
            do
            {
                v1 = 2.0 * random.NextDouble() - 1.0;
                v2 = 2.0 * random.NextDouble() - 1.0;
                s = v1 * v1 + v2 * v2;
            } while (s >= 1.0 || s == 0);

            s = System.Math.Sqrt((-2.0 * System.Math.Log(s)) / s);

            next_gaussian = v2 * s;
            uselast = true;
            return v1 * s;
        }
    }

    public static double BoxMuller(double mean, double standard_deviation)
    {
        return mean + BoxMuller() * standard_deviation;
    }

    public static int Next(int min, int max)
    {
        return (int)BoxMuller(min + (max - min) / 2.0, 1.0); 
    }
}

I probably need to scale the standard deviation some how relative to the range, but don't understand how.

Answer:

    // Will approximitely give a random gaussian integer between min and max so that min and max are at
    // 3.5 deviations from the mean (half-way of min and max).
    public static int Next(int min, int max)
    {
        double deviations = 3.5;
        int r;
        while ((r = (int)BoxMuller(min + (max - min) / 2.0, (max - min) / 2.0 / deviations)) > max || r < min)
        {
        }

        return r;
    }
+2  A: 

If the Box-Muller method returns a "standard" normal distribution, it will have mean 0 and standard deviation 1. To transform a standard normal distribution, you multiply your random number by X to get standard deviation X, and you add Y to obtain mean Y, if memory serves me correctly.

See the Wikipedia article's section on normalizing standard normal variables (property 1) for a more formal proof.


In response to your comment, the rule of thumb is that 99.7% of a normal distribution will be within +/- 3 times the standard deviation. If you need a normal distribution from 0 to 100 for instance, than your mean will be halfway, and your SD will be (100/2)/3 = 16.667. So whatever values you get out of your Box-Muller algorithm, multiply by 16.667 to "stretch" the distribution out, then add 50 to "center" it.


John, in response to your newest comment, I'm really not sure what is the point of the Next function. It always uses a standard deviation of 1 and a mean of halfway between your min and max.

If you want a mean of Y, with ~99.7% of the numbers in the range -X to +X, then you just call BoxMuller(Y, X/3).

Mark Rushakoff
Yes, this is what the BoxMuller(double mean, double standard_deviation) achieves above. The problem, however, is that the Next(int min, int max) returns values very close to the half way of the range. This is because I don't understand how to "scale" the deviation correctly.
So for 3.5 standard deviations, it would be "return (int)BoxMuller(min + (max - min) / 2.0, (max - min) / 2.0 / 3.5);"?
Can I just comment that +/- 3 times standard deviation gives you not 97% but 99.7%.+/-sigma: ~68%+/-2sigma: ~95%+/-3sigma: ~99.7%http://en.wikipedia.org/wiki/68-95-99.7_rule
DmitryK
Thanks for pointing that out, Dmitry -- my statistics were a bit rusty, apparently. I updated the answer.
Mark Rushakoff
A: 

Well, the -2*sigma..+2*sigma will give you 95% of the bell curve. (check the "Standard deviation and confidence intervals" section in the already mentioned wiki article).

So modify this piece:

return (int)BoxMuller(min + (max - min) / 2.0, 1.0);

and change 1.0 (standard deviation) to 2.0 (or even more if you want more than 95% coverage)

return (int)BoxMuller(min + (max - min) / 2.0, 2.0);
DmitryK
Now I understand what I want to formulate thanks to you.. I want the -2*sigma .. +2*sigma to be at the beginning and the end of the range (min, max) respectively.
Ah, I understand now. So you want to step out from your mean value left and right and "hit" either min or max with a 95% confidence.In this case - you keep your mean as you have it (min + (max-min)/2), but you need to calculate your sigma (standard deviation). Stepping out by 2*sigma gives us that 95% interval. So the length of this interval is 4*sigma. But we can also calculate it as (max-min). Which gives us sigma=(max-min)/4.Can you please try that?
DmitryK
"hit" within the min..max range with the 95% confidence - just to be precise in my wording.
DmitryK
This works also, too bad I can only make one answer as correct. Stackoverflow should add "assisted" or "shared" answers.
A: 

You need to work out how much of your normal distribution must be within your 'range'. You can't say all of the distribution should be within the range, because as you note the distribution runs to infinity.

Instead you need to specify that, e.g. 99% of the distribution lives within your range, or 99.99% lives within your range. This will in turn allow you to calculate your required standard deviation. This is quite important as it will strongly influence the chance of distribution of values near your range boundaries.

Obviously your 'mean' value will be the midpoint of your range.

Kirk Broadhurst