tags:

views:

542

answers:

4

I'm trying to implement a weighted random numbers. I'm currently just banging my head against the wall and cannot figure this out.

In my project (Hold'em hand-ranges, subjective all-in equity analysis), I'm using Boost's random -functions. So, let's say I want to pick a random number between 1 and 3 (so either 1, 2 or 3). Boost's mersenne twister generator works like a charm for this. However, I want the pick to be weighted for example like this:

1 (90% chance to be picked up)
2 (56% chance to be picked up)
3 ( 4% chance to be picked up)

Does Boost have some sort of functionality for this?

+6  A: 

There is a straightforward algorithm for picking an item at random, where items have individual weights:

1) calculate the sum of all the weights

2) pick a random number between 0 and less than the sum weights

3) go through the items in order, subtracting their weight from your random number, until you get the item where the random number is less than that item's weight

Pseudo-code illustrating this:

int sum_of_weight = 0;
for(int i=0; i<num_choices; i++) {
   sum_of_weight += choice_weight[i];
}
int rnd = random(sum_of_weight);
for(int i=0; i<num_choices; i++) {
  if(rnd < choice_weight[i])
    return i;
  rnd -= choice_weight[i];
}
assert(!"should never get here");

This should be straightforward to adapt to your boost containers and such.

If you do not know how many items in the list, then there's a very neat algorithm called reservoir sampling that can be adapted to be weighted.

Will
As an optimization you could use cumulative weights and use a binary search. But for only three different values this is probably overkill.
sellibitze
A: 

Choose a random number on [0,1), which should be the default operator() for a boost RNG. Choose the item with cumulative probability density function >= that number:

template <class It,class P>
It choose_p(It begin,It end,P const& p)
{
    if (begin==end) return end;
    double sum=0.;
    for (It i=begin;i!=end;++i)
        sum+=p(*i);
    double choice=sum*random01();
    for (It i=begin;;) {
        choice -= p(*i);
        It r=i;
        ++i;
        if (choice<0 || i==end) return r;
    }
    return begin; //unreachable
}

Where random01() returns a double >=0 and <1. Note that the above doesn't require the probabilities to sum to 1; it normalizes them for you.

p is just a function assigning a probability to an item in the collection [begin,end). You can omit it (or use an identity) if you just have a sequence of probabilities.

wrang-wrang
A: 

A really naive way would be to fill an array of 100 numbers and put 90 1's, 6 2's and 4 3's then random(100) and use that number as an index for the array. To make it more interesting you can "shuffle" the array and see if that helps.

dasickis
+1  A: 

Build a bag (or std::vector) of all the items that can be picked.
Make sure that the number of each items is proportional to your weighting.

Example:

  • 1 60%
  • 2 35%
  • 3 5%

So have a bag with 100 items with 60 1's, 35 2's and 5 3's.
Now randomly sort the bag (std::random_shuffle)

Pick elements from the bag sequentially until it is empty.
Once empty re-randomize the bag and start again.

Martin York
if you have a bag of red and blue marbles and you select a red marble from it and _don't_ replace it is the probability of selecting another red marble still the same? In the same way, your statement "Pick elements from the bag sequentially until it is empty" produces a totally different distribution than intended.
ldog
@ldog: I understand your argument but we are not looking for true randomness we are looking for a particular distribution. This technique guarantees the correct distribution.
Martin York
my point exactly is that you do not correctly produce distribution, by my previous argument. consider the simple counter example, say you put you have an array of 3 as `1,2,2` producing 1 1/3 of the time and 2 2/3. Randomize the array, pick the first, lets say a 2, now the next element you pick follows the distribution of 1 1/2 the time and 2 1/2 the time. Savvy?
ldog