views:

837

answers:

8

Is there any way to randomly generate a set of positive numbers such that they have a desired mean and standard deviation?

I have an algorithm to generate numbers with a gaussian distribution, but I don't know how to deal with negative numbers in a way the preserves the mean and standard deviation.
It looks like a poisson distribution might be a good approximation, but it takes only a mean.

EDIT: There's been some confusion in the responses so I'll try to clarify.

I have a set of numbers that give me a mean and a standard deviation. I would like to generate an equally sized set of numbers with an equivalent mean and standard deviation. Normally, I would use a gaussian distribution to do this, however in this case I have an additional constraint that all values must be greater than zero.

The algorithm I'm looking for doesn't need to be gaussian-based (judging by the comments so far, it probably shouldn't be) and doesn't need to be perfect. It doesn't matter if the resulting number set has a slightly different mean/standard deviation -- I just want something that will usually be in the ballpark.

+5  A: 

You could use a log-normal distribution.

David Norman
+5  A: 

First, you can't generate only positive values from a Gaussian distribution.

Second, am I understanding correctly that you are trying to generate a random distribution with given mean and standard deviation? Will any distribution do? If so, let mean = m and standard deviation = s. I am assuming that m - s > 0.

let n = random integer modulo 2;
if n equals 0 return m - s
else return m + s

The values returned by this process will have mean m and standard deviation s.

Jason
I doubt your proposition will satisfy his needs, but I have to give it +1 for an interesting answer to the question. That being said, your answer has a flaw: if m < s, your distribution will not be positive.
Mathias
@Mathias: I made the statement "I am assuming that `m - s > 0`."
Jason
That is an interesting answer. Unfortunately, in my case it's not always true that m > s. I'd also like a little more variation to the generated values, though I didn't mention that in the question. +1 for a novel solution, though.
Whatsit
@Jason: I tried to keep the spirit of your solution (the simplest distribution satisfying the requirements) and worked out a general solution for any m and s below...
Mathias
+3  A: 

You may be looking for log-normal distribution, as David Norman suggested, or maybe exponential, binomial, or some other distribution. If you have an algorithm to generate one distribution, it is probably not good for generating numbers conforming to another distribution. But only you know how your numbers are really distributed.

With normal distribution, the random variable's range is from negative infinity to positive infinity, so if you're looking for positive numbers only, then it is not Gaussian.

Different distributions also have unique properties, for example, with Poisson distribution, the standard deviations is always equal to the mean. (That's why your library function doesn't ask from the standard deviation parameter, only the mean).

In the worst case, you could generate a random real number between 0 and 1 and compute the probability density function on your own. (Depending on the distribution, this may be much easier said than done).

azheglov
++ Simplest way to do this is 1) take the log of each original data point, 2) get the mean and sigma of that, 3) generate gaussian normal random numbers with that mean and sigma, and 4) take exp of each number. The results should be similar to what you started with. (To generate a gaussian random number, a simple way is to add up 12 uniform random numbers in the range +/- 0.5.)
Mike Dunlavey
+1  A: 

If i understand you correctly you want to generate random numbers from a distribution with positive support. There are many possible choices. The simplest is the

chi-square: http://en.wikipedia.org/wiki/Chi-square%5Fdistribution (which is just the sum of two squared gaussians)

All the assymetric distribution (exponential, weibull, pareto, Inverse Gaussian, log-normal, Gamma)

All the distributions from the skew familly (skew-normal, skew-student,...)

All the above functions are such that any random number drawn from any of them will allways be positive.

+3  A: 

I couldn't resist - I really like Jason's angle but wasn't happy that his answer only covers cases where m > s, so I worked out a general solution following his idea.
The most simple distribution with given m,s and positive terms is

with probability p, return 0
with probability (1-p), return m / (1-p)
where (1-p) = m^2 / (m^2 + s^2)

Proof: for a distribution X with two outcomes lowX with probability p and highX with probability (1-p),
m = E[X] = p x lowX + (1-p) x highX
s^2 = Variance(X) = E[X^2] - E[X]^2 = p x lowX^2 + (1-p) x highX^2 - m^2

Set lowX to 0 and resolve in highX and p.

Mathias
That is beautiful.
Jason
Thank you - given the spirit of your answer, I thought you would appreciate :)
Mathias
+4  A: 

Why not use a resampling method? If you have n numbers in your sample, just take n random draws from the sample, with replacement. The resulting set will have expected mean and variance about the same as your original sample, but it will usually be slightly different.

This said, without knowing why you need more random numbers, it's impossible to say what the right answer is. One wonders if you're trying to solve the wrong problem...

Harlan
Resampling is an interesting suggestion. In his initial statement, Whatsit didn't say that he had a sample, he only mentioned he had a mean + variance. Polling from the sample will not only replicate the mean and variance, it will also by definition match the shape of the distribution... It would be a good idea if Whatsit wants to run simulations.
Mathias
+2  A: 

You could use any distribution which has positive support AND can be specified by mean and variance. For example,

  • one-parameter distributions won't work in general. For example chi-square won't work unless your variance is always double its mean. Similarly exponential won't work unless your variance equals your mean squared.
  • some two-parameter distributions won't work in some cases. Binomial distribution won't work unless variance is less than your mean. Similarly the non-central chi-square won't work unless your variance is greater than 2 times your mean and less than 4 times your mean!
  • However log-normal and gamma will work in all cases.
Apprentice Queue
A: 

what the heck are you clowns talking about? the standard normal has mean zero, but that's a special case of the Gaussian distribution, which has parameters mean and standard deviation. as the mean increases, with sd held constant, the probability of generating any numbers below zero diminishes to zero. you can absolutely have a gaussian distribution with no negative numbers.

Laurie
Yes for fixed sd and increasing mean, the probability approaches 0 however it is never 0. By design, lognormal and gamma distributions will never generate a negative number.
Apprentice Queue