views:

359

answers:

7

I am looking for a random number generator that can be biased. For instance, say I want a random number between 1-5, with the probability being:

1: Comes up 20% of the time
2: Comes up 10% of the time
3: Comes up 40% of the time
4: Comes up 25% of the time
5: Comes up 5% of the time

Is there anything in the standard library, or other libraries out there that would do this? Alternatively, is there an efficient way to do this myself?

+5  A: 

For your problem, just pick a random element from this list uniformly:

[1, 1, 1, 1, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 5]

In general, check this answer: http://stackoverflow.com/questions/1761626/weighted-random-numbers


In TR1 and C++0x, there is <random> header which contains the discrete_distribution class to generate such numbers, among others.

You may also want to check out GSL which contains much more random distributions (and random number generators) than the standard <random> library. (But note that GSL uses GPLv3.)

KennyTM
Maybe I should have explained my actual situation a little better. What I really need will be a random number between 1-50,000ish. Creating a list that long seems unnecessary, and unneeded. Sorry for the confusion.
cmptrer
That's why good engineering requires a proper set of requirements. You asked for the simple case and you got the (correct) simple answer.
Barry Wark
+5  A: 

Best way's probably to just take the normal unbiased random generator then return based on the interval its value falls into.

Just an if statement that gives 1 for 0:0.2, 2 for 0.2:0.3, 3 for 0.3:0.7, 4 for 0.7:0.95 and 5 for 0.95:1. Best to make either the lower or upper limit of the interval inclusive and the other exclusive.

int biasedRandom(){
double i = randomNumber();
if(i<= 0.2){return 1;}
else if(i <= 0.3){return 2;}
else if(i <= 0.7){return 3;}
else if(i <= 0.95){return 4;}
else{return 5;}
}

Something like that.

AaronM
If you have a whole lot of intervals to check, what you should do is come up with a cumulative distribution array (wasn't sure what to call it) and do a binary search on it each time to find the the number that was generated.
Justin Peel
Seems like this generator will return 5 always. Double/floats needed and some other math unless randomNumber() returns a value 0-1.
Xorlev
Yeah you were right it would have given 5 every time, didn't mean to make i an int, that was a mistake. Thanks for pointing it out.
AaronM
A: 

Why don't you just use a regular random number generator that return number between 0.0 and 1.0, and wrap it with another function that returns a number according to your requirements?

like

double biased (double seed) {
if (seed >= 0.0 && seed <0.2) return 1;
else if  ...
}
DaClown
I would not use `seed` as an identifier for a randomly generated number, it's confusing...
Matthieu M.
Why not? The C rand function also takes a seed, in most cases the system time. So biased is a function that generates a random number based on a seed.
DaClown
A: 

Throw a random real number x in [0,1], if 0< x<0.2 return 1, if 0.2<x <0.3 return 2, etc.

See here for the general problem.

leonbloy
A: 

Kenny gave an appropriate answer tailored to your particular frequency distribution.

The more general answer works with a CDF - Cumulative Distribution Function - for the data, and uses a uniform random number to pick a value within the distribution.

Jonathan Leffler
+3  A: 

What you are describing is the implementation of a random number generator that draws from a particular probability distribution. For example, drawing numbers from a Gaussian distribution should draw random numbers such that the probability of a particular draw, x is proportional to alt text.

In general, the approach is to draw from a uniform random distribution and then pick the value of the desired distribution's cumulative distribution function (CDF) at that drawn location. In the case of a Normal Gaussian, draw a random number, x from a uniform distribution (this is what standard random number generators should give) and then choose alt text as the random, Gaussian distributed value. For your case, the CDF you describe is a piece-wise continuous stair-step function which could be implemented using any of the many (correct) answers you have already received.

Of course, this is all trivia. What you should be doing is using a library that already handles this for you. Statistics and random number generation are not trivial and there's no need to re-invent the wheel. See Neil's answer (and check out the Boost random number library).

Barry Wark
+6  A: 

The Boost random number library provides the ability to specify different shaped distributions for your generator. It's a great library - see http://www.boost.org/doc/libs/1_42_0/libs/random/index.html.

anon
Yes, don't reinvent the wheel!
Barry Wark
Someday I am going to start using Boost. Someday.
Michael Myers
It just lacks a proper documentation... would be great to know what the required concepts for the engine and the distribution are so as to be able to craft one's own without reverse engineering the library :(
Matthieu M.
Libraries are good, but I think it's safe to say he's a pretty new programmer, in which case, they're more of a hindrance as you need the experience at first to get used to the logic.
AaronM
@AaronM Writing your own random number generator is fraught with problems - using an existing, tested one is a much better bet.
anon
@mmeyers, @matthieu: Check out TR1's random library. (Random link from Google: http://www.johndcook.com/cpp_TR1_random.html.) Likely bundled with your compiler and likely a cousin of Boost, with higher level of documentation.
Potatoswatter