If you are looking for normally-distributed numbers with as little correlation as possible, and need to be rigorous* about this, I would suggest you take the following mathematical approach and translate into code.
(*rigorous: the problem with other approaches is that you can get "long tails" in your distributions -- in other words, it is rare but possible to have outliers that are very different from your expected output)
- Generate N-1 independent and identically distributed (IID) gaussian random variables v0, v1, v2, ... vN-1 to match the N-1 degrees of freedom of your problem.
- Create a column vector V where V = [0 v0, v1, v2, ... vN-1]T
- Use a fixed weighting matrix W, where W consists of an orthonormal matrix** whose top row is [1 1 1 1 1 1 1 ... 1] / sqrt(N).
- Your output vector is the product WV + SU/N where S is the desired sum and U is the column vector of 1's. In other words, the i'th output variable = the dot product of (row #i of matrix W) and column vector V, added to S/N.
The standard deviation of each output variable will be (I believe, can't verify right now) sqrt(N/N-1) * the standard deviation of the input random variables.
**orthonormal matrix: this is the hard part, I put in a question at math.stackexchange.com and there's a simple matrix W that works, and can be defined algorithmically with only 3 distinct values, so that you don't actually have to construct the matrix.
W is the Householder reflection of v-w where v = [sqrt(N), 0, 0, 0, ... ] and w = [1 1 1 1 1 ... 1] can be defined by:
W(1,i) = W(i,1) = 1/sqrt(N)
W(i,i) = 1 - K for i >= 2
W(i,j) = -K for i,j >= 2, i != j
K = 1/sqrt(N)/(sqrt(N)-1)
The problem with Mark's approach:
Why not just generate the right number of uniformly distributed random numbers, tot them up and scale ?
is that if you do this, you get a "long tail" distribution. Here's an example in MATLAB:
>> X = rand(100000,10);
>> Y = X ./ repmat(sum(X,2),1,10);
>> plot(sort(Y))
I've generated 100,000 sets of N=10 numbers in matrix X, and created matrix Y where each row of Y is the corresponding row of X divided by its sum (so that each row of Y sums to 1.0)
Plotting the sorted values of Y (each column sorted separately) yields approximately the same cumulative distribution:
A true uniform distribution would yield a straight line from 0 to the maximum value. You'll notice that it's sort of vaguely similar to a true uniform distribution, except at the end where there's a long tail. There's an excess of numbers generated between 0.2 and 0.5. The tail gets worse for larger values of N, because although the average value of the numbers goes down (mean = 1 / N), the maximum value stays at 1.0: the vector consisting of 9 values of 0.0 and 1 value of 1.0 is valid and can be generated this way, but is pathologically rare.
If you don't care about this, go ahead and use this method. And there are probably ways to generate "almost"-uniform or "almost"-gaussian distributions with desired sums, that are much simpler and more efficient than the one I describe above. But I caution you to be careful and understand the consequences of the algorithm you choose.
One fixup that leaves things sort-of-uniformly distributed without the long tail, is as follows:
- Generate a vector V = N uniformly-distributed random numbers from 0.0 to 1.0.
- Find their sum S and their maximum value M.
- If S < k*M (maximum value is too much of an outlier), go back to step 1. I'm not sure what value to use for k, maybe k = N/2?
- Output the vector V*Sdesired/S
Example in MATLAB for N=10:
>> X = rand(100000,10);
>> Y = X ./ repmat(sum(X,2),1,10);
>> i = sum(X,2)>(10/2)*max(X,[],2);
>> plot(sort(Y(i,:)))