views:

387

answers:

3

Given an array of size n I want to generate random probabilities for each index such that Sigma(a[0]..a[n-1])=1

One possible result might be:

0     1     2     3     4
0.15  0.2   0.18  0.22  0.25

Another perfectly legal result can be:

0     1     2     3     4
0.01  0.01  0.96  0.01  0.01

How can I generate these easily and quickly? Answers in any language are fine, Java preferred.

+15  A: 

Get n random numbers, calculate their sum and normalize the sum to 1 by dividing each number with the sum.

Kobi
Nice :) didn't think of that... תודה!
Yuval A
This introduces bias. You cannot sample uniformly from a simplex this way.
dreeves
@dreeves - can you elaborate?
Yuval A
Figure 2 of this paper -- http://www.cs.cmu.edu/~nasmith/papers/smith+tromble.tr04.pdf -- gives a visual depiction of the bias in the n=3 case of the normalization method you propose. I believe that kohomologie's answer is correct (though I didn't check the code they included).
dreeves
+4  A: 

The task you are trying to accomplish is tantamount to drawing a random point from the N-dimensional unit simplex.

http://en.wikipedia.org/wiki/Simplex#Random_sampling might help you.

A naive solution might go as following:

public static double[] getArray(int n)
    {
        double a[] = new double[n];
        double s = 0.0d;
        Random random = new Random();
        for (int i = 0; i < n; i++)
        {
           a [i] = 1.0d - random.nextDouble();
           a [i] = -1 * Math.log(a[i]);
           s += a[i];
        }
        for (int i = 0; i < n; i++)
        {
           a [i] /= s;
        }
        return a;
    }

To draw a point uniformly from the N-dimensional unit simplex, we must take a vector of exponentially distributed random variables, then normalize it by the sum of those variables. To get an exponentially distibuted value, we take a log of uniformly distributed value.

kohomologie
In most languages, you should create `Random` only once, or you will get not random results (and in many cases - the same number over and over). I'm also concerned about the use of `log` - can you please explain why it's there?
Kobi
+1 for good reference, but I think `nextDouble()` already adjusts for uniform distribution: http://java.sun.com/javase/6/docs/api/java/util/Random.html#nextDouble()
trashgod
Kobi, thanks for pointing out the `new Random()` thing. As for `log` -- I have edited my post to include more thorough explanation.
kohomologie
trashgod, nextDouble() returns an uniformly distributed value from [0,1). So when we take n such values, we get a random point in a parallelotope. If we divide this point's coordinates by Manhattan distance between this point and the origin, I'm quite unsure that we'll get an uniformly distributed point in the unit simplex.
kohomologie
@kohomologie: thank you for clarifying; the algorithm presumes uniform distribution and uses -ln to get the required exponential distribution.
trashgod
A: 

If you want to generate values from a normal distribution efficiently, try the Box Muller Transformation.

James McLeod
Good tip, but I believe it isn't germane.
dreeves