tags:

views:

58

answers:

2

I want a random number generator with non-uniform distribution, ie:

// prints 0 with 0.1 probability, and 1 with 0.9 probability
echo probRandom(array(10, 90));

This is what I have right now:

/**
 * method to generated a *not uniformly* random index
 *
 * @param array $probs int array with weights 
 * @return int a random index in $probs
 */
function probRandom($probs) {
    $size = count($probs);

    // construct probability vector
    $prob_vector = array();
    $ptr = 0;
    for ($i=0; $i<$size; $i++) {
        $ptr += $probs[$i]; 
        $prob_vector[$i] = $ptr;
    }

    // get a random number
    $rand = rand(0, $ptr);
    for ($i=0, $ret = false; $ret === false; $i++) {
        if ($rand <= $prob_vector[$i])
            return $i;
    }   
}

Can anyone think of a better way? Possibly one that doesn't require me to do pre-processing?

+1  A: 

In your solution you generate an accumulated probability vector, which is very useful.

I have two suggestions for improvement:

  • if $probs are static, i.e. it's the same vector every time you want to generate a random number, you can preprocess $prob_vector just once and keep it.
  • you can use binary search for the $i (Newton bisection method)

EDIT: I now see that you ask for a solution without preprocessing.

Without preprocessing, you will end up with worst case linear runtime (i.e., double the length of the vector, and your running time will double as well).

Here is a method that doesn't require preprocessing. It does, however, require you to know a maximum limit of the elements in $probs:

Rejection method

  • Pick a random index, $i and a random number, X (uniformly) between 0 and max($probs)-1, inclusive.
  • If X is less than $probs[$i], you're done - $i is your random number
  • Otherwise reject $i (hence the name of the method) and restart.
Bjarke Ebert
+1  A: 

If you know the sum of all elements in $probs, you can do this without preprocessing.

Like so:

$max = sum($probs);
$r = rand(0,$max-1);
$tot = 0;
for ($i = 0; $i < length($probs); $i++) {
    $tot += $probs[$i];
    if ($r < $tot) {
        return $i;
    }
}

This will do what you want in O(N) time, where N is the length of the array. This is a firm lower bound on the algorithmic runtime of such an algorithm, as each element in the input must be considered.

The probability a given index $i is selected is $probs[$i]/sum($probs), given that the rand function returns independent uniformly distributed integers in the given range.

Borealid