views:

287

answers:

6

Edit: I've rewritten the question in hopes that the goal is a little clearer.

This is an extended question to this question here, and I really like the function provided in this answer.

In the answer above, one is able to set the probability of hitting an extreme, with higher numbers producing a higher probability of getting lower numbers and vice-versa. The issue is that I must set the probabilities for 3 groups. Those groups are Lowest Value (LV), Highest Value (HV), and Middle Value (MV). However, to simplify the request, we can consider EVP=HVP=LVP.

Given any range, the HV/LV should appear based on the specified EVP and, as you progress/degress through the range from each extreme, the probability of the next value in the range will increase, or decrease, based on the distance between EVP and MVP.

Using an example range of 1-6, with 1 and 6 being weighted at 5% (EVP), the probability spread would be 1/6 is 5%, 2/4 is 15%, and 3/4 is 30% (MVP), totalling 100%. The reverse should also be possible, swapping EVP and MVP should produce an inverse of the graph below.

Here's an image that I hope will convey the results expected from the given example.

Middle weighted:

Middle Weighted Graph

Bonus: It would be most excellent if I was able to set HVP and LVP separately producing a result similar to the graph below (Note: The graph is not accurate to specification above).

Middle weighted (bonus):

Middle Weighted Bonus Graph

Thanks!

+2  A: 

Assuming you can cope with whole numbers for the percentages, just assign each value between 0 and 99 a result - e.g. 0-9 could have a result of 1 and 95-99 could have a result of 6 (to give your 10%=1 and 5%=6 scenario). Once you've got that translation function (however you achieve that - there are various approaches you could use) you just need to generate a random number in the range 0-99 and translate it to the result.

Your question isn't really clear in terms of the code you want (or even which language - C# or PHP?) but hopefully that will help.

Here's some C# code which will let you get any bias you like, within reason - you don't have to express it as percentages, but you can do:

static int BiasedRandom(Random rng, params int[] chances)
{
    int sum = chances.Sum();
    int roll = rng.Next(sum);
    for (int i = 0; i < chances.Length - 1; i++)
    {
        if (roll < chances[i])
        {
            return i;
        }
        roll -= chances[i];
    }
    return chances.Length - 1;
}

So for example, you could use

int roll = BiasedRandom(rng, 10, 10, 10, 10, 10, 50) + 1;

which will give a 10% chance for each of 1-5, and a 50% chance of getting a 6.

Jon Skeet
Hi there, and thanks for the quick response. Regarding language, I'm pretty comfortable with either PHP or C#, which is why I requested either via tag and question. However, math is obviously not a strong suit (at least not at the moment). I would much prefer something similar to the "gamma" approach used in the answer from the other question (which is actually a PHP example) to simplify the levels of probability. So, by specifying 1.5 as the "gamma" (or maybe 1.5 gammaHigh 0.9 gammaLow per the bonus), it would work as stated in the previous answer, but affect both upper and lower bounds.
Kevin Peno
@Kevin: It's not obvious to me *exactly* the probabilities relate to the gamma. I used the fact that you expressed two probabilities as percentages and extrapolated from that. You talk about using "the resulting cascade" - but how is that defined? What's the *real* scenario you're facing?
Jon Skeet
Hey Jon. I've updated my question. Please let me know if this is a better explaination.
Kevin Peno
A: 

First you need to characterize your current random number generator. In the case of PHP, the rand() function returns a nice flat profile - so there's no pre-processing required.

The remap the output distribution function so the area under it is unity and the range starts at 0. Then calculate its integral. Store the integral (e.g. as an array of values). Then when you need a random number matchnig the profile, first get a random number between 0 and 1 from the built-in generator, then find the Y coordinate on the integral where the X coordinate is the value you generated. Finally, scale the value to the desired range (e.g. if looking for a value between 0 and 10, multiply by 10, if looking for a value between -8 and +8, mutliply by 16 and subtract 8).

If your random number generator does not generate a flat profile, then the simplest approach would be to convert it to a flat profile using the reverse of the method above.

symcbean
+1  A: 

A general technique when generating non-uniform random number is using rejection sampling. Even though it may be ineffective in this case you still should know how to do this, because it works for any density function you provide.

function random($density, $max) {
    do {
        $rand = lcg_value();
        $rand2 = lcg_value() * $max;
    } while ($density($rand) < $rand2);
    return $rand;
}

$density here is a density function accepting a floating point number between zero and one as argument and returning a value smaller then $max. For your example this density function could be:

$density = function($x) {
    static $values = array(
        1 => 0.05,
        2 => 0.15,
        3 => 0.30,
        4 => 0.30,
        5 => 0.15,
        6 => 0.05,
    );

    return $values[ceil($x * 6)];
};

An example call then would be:

ceil(random($density, 0.3) * 6); // 0.3 is the greatest value returned by $density
// round and * 6 are used to map a 0 - 1 float to a 1 - 6 int.

Rejection sampling is especially useful if you can't easily calculate the inverse of the distribution. As in this case it is pretty easy to calculate the inverse using inverse transform sampling is probably the better choice. But that is already covered in Jon's answer.

PS: The above implementation is general and thus uses a random value between 0 and 1. By building a function that only works for your approach everything get's easier:

function random() {
    static $values = array(
        1 => 0.05,
        2 => 0.15,
        3 => 0.30,
        4 => 0.30,
        5 => 0.15,
        6 => 0.05,
    );

    do {
        $rand = mt_rand(1, 6);
        $rand2 = lcg_value() * 0.3;
    } while ($values[$rand] < $rand2);
    return $rand;
}

random();
nikic
+1 for the lesson in rejection sampling as well as combining that with php's new closures for a callback/density. As you've pointed out, it probably wont work 100% for my needs, but I very much appriciated the lesson in mathmatics vs. handed over code.
Kevin Peno
+4  A: 

Hi there Kevin!

Since I'm stuck at home today because of the flu :( I decided to try and figure this out for you. Essentially what you're asking for is some sort of interpolation. I used the easiest (linear) and these are my results and code. The code is kind of messy and I may fix it in the upcoming days..

<?php

// this function interpolates $a to $b over $steps steps, starting from key $k
// this can be cleaned up significantly
function interpolate($a, $b, $steps, $k) {
    @$per_step = abs($a - $b)/$steps; // suppress warnings in case of division by zero
    if ($a > $b)
        $decreasing = true;
    else
        $decreasing = false;
    $final = array();
    for ($i = 1; $i <= $steps-1; ++$i) {
        if ($decreasing)
            $final[$i+$k] = $a-=$per_step; // linear interpolation
        else
            $final[$i+$k] = $a+=$per_step; // linear interpolation
    }
    return $final;
}

// this function combines probability arrays after the interpolation occurs
// this may happen multiple times, think about 1, 3, 5. interpolation would have to occur
// from 1 -> 2 -> 3, and from 3 -> 4 -> 5.
function interpolateProbabilities ($nodes) {
    $pNodes = array();
    $pNodes = $nodes;
    $keys = array_keys($nodes);
    for ($i = 0; $i < count($keys); $i++) {
        if ($keys[$i+1] - $keys[$i] != 1) {
            $pNodes += interpolate($nodes[$keys[$i]], $nodes[$keys[$i+1]], $keys[$i+1] - $keys[$i], $keys[$i]);
        }
    }
    ksort($pNodes);
    return $pNodes;
}

// this generates a weighed random value and is pretty much copy-pasted from:
// http://w-shadow.com/blog/2008/12/10/fast-weighted-random-choice-in-php/
// it's robust and re-writing it would be somewhat pointless
function generateWeighedRandomValue($nodes) {
    $weights = array_values($nodes);
    $values = array_keys($nodes);
    $count = count($values);
    $i = 0;
    $n = 0;
    $num = mt_rand(0, array_sum($weights));
    while($i < $count) {
        $n += $weights[$i];
        if($n >= $num) {
            break;
           }
        $i++;
       }
    return $values[$i];
}

// two test cases
$nodes = array( 1 => 12, 5 => 22, 9 => 31, 10 => 35); // test 1
$nodes = array( 1 => 22, 3 => 50, 6 => 2, 7 => 16, 10 => 10); // test 2
$export = array();

// run it 1000 times
for ($i = 0; $i < 1000; ++$i) {
    $export[generateWeighedRandomValue(interpolateProbabilities($nodes))]++;
}

// for copy-pasting into excel to test out distribution
print_r($export);

?>

The results are, I think, exactly what you're looking for. In the case of:

$nodes = array( 1 => 12, 5 => 22, 9 => 31, 10 => 35); // test 1

I got the following (final) array:

Array
(
    [5] => 92
    [7] => 94
    [10] => 162
    [8] => 140
    [3] => 71
    [6] => 114
    [2] => 75
    [4] => 69
    [9] => 131
    [1] => 52
)

Namely, 1 should happen 12% of the time, 5 22%, 9 31%, and 10 35% of the time. Lets graph it: graph 1

It looks promising, but lets try something crazier...

$nodes = array( 1 => 22, 3 => 50, 6 => 2, 7 => 16, 10 => 10); // test 2

In this case, 3 should occur 50% of the time, and steeply decrease into 6. Lets see what happens! This is the array (in retrospect, I should have sorted these arrays):

Array
(
    [4] => 163
    [7] => 64
    [2] => 180
    [10] => 47
    [1] => 115
    [5] => 81
    [3] => 227
    [8] => 57
    [6] => 6
    [9] => 60
)

And lets look at the picture:

alt text

It looks like it works :)

I hope I was able to solve your problem (or at least point you in the right direction). Note that my code currently has a number of stipulations. Namely, the initial nodes you provide MUST have probabilities that add up to 100% or you may get some wonky behavior.

Also, the code is kind of messy but the concepts are relatively simple. Some other cool stuff would be to try and instead of using linear interpolation, use some other kind, which would give you more interesting results!


Algorithm

To avoid confusion I'll just show exactly how the algorithm works. I give PHP a $node array that's in the form of integer => frequency in percentage and ends up looking something like array( 1 => 22, 3 => 50, 6 => 2, 7 => 16, 10 => 10), which is test 2 from above.

Test 2 basically says that you want 5 control nodes placed at 1, 3, 6, 7, and 10 with the frequencies of 22%, 50%, 2%, 16%, and 10% respectively. First, I need to see exactly where I need to do the interpolation. For example, I don't need to do it between 6 and 7, but I do need to do it between 1 and 3 (we need to interpolate 2) and 7 and 10 (where we need to interpolate 8 and 9).

The interpolation between 1 -> 3 has (3 - 1) - 1 = 1 steps and should be inserted at key[2] in the original array. The value (%) for the 1 -> 3 interpolation is abs($a - $b) / $steps which translates to the absolute value of the % of 1 minus the % of 2, divided by steps + 1 which, in our case, happens to equal 14. We need to see if the function is increasing or decreasing (hello Calculus). If the function is increasing we keep adding the step % to the new interpolation array until we filled all of our empty spots (if the function is decreasing, we subtract the step % value. Since we only need to fill one spot, we return 2 => 36 (22 + 14 = 36).

We combine the arrays and the result is (1 => 22, 2 => 36, 3 => 50, 6 => 2, 7 => 16, 10 => 10). The program interpolated 2, which was a percent value that we didn't explicitly declare.

In the case of 7 -> 10, there are 2 steps, the step percentage is 2 which comes from (16-10) / (3 + 1) = 2. The function is decreasing, so we need to subtract 2 repeatedly. The final interpolated array is (8 => 14, 9 => 12). We combine all of the arrays and voila.

The following image shows the green (initial values) and the red (interpolated values). You may have to "view image" to see the whole thing clearly. You'll notice that I use ± since the algorithm needs to figure out if we're supposed to be increasing or decreasing over a certain period.

alt text


This code should probably be written in a more OOP paradigm. I play a lot with array keys (for example, I need to pass $k so it's easier to combine arrays once I return them from interpolate($a, $b, $steps, $k) because they automatically have the right keys. This is just a PHP idiosyncrasy and in retrospect, I should have probably went with a more readable OOP approach to begin with.


This is my last edit, I promise :) Since I love playing with Excel, this shows how the percentages normalize once the numbers are interpolated. This is important to see, especially considering that in your first picture, what you're showing is somewhat of a mathematical impossibility.

Test 1 alt text Test 2 alt text

You'll notice that the percentages dampen significantly to accommodate the interpolation. Your second graph in reality would look more like this:

alt text

In this graph, I weighed 1 = > 1, 5 => 98, 10 => 1 and you see the extremes of the dampening effect. After all, percentages, by definition have to add up to 100! It's just important to realize that the dampening effect is directly proportional to the number of steps between extremes.

David Titarenco
When I was trying to solve this problem I kept asking myself what mathematical term this equated to, yet I, sadly, could never answer that. All in all, your answer is more robust than even my question asked of it, I can set any part of the range to a probability and get randoms of all other values based on the distance between set values. I just wish I understood the voodoo behind it a little more. Let me review and I'll be sure to get back to you before time is up :) P.S. Feel better!
Kevin Peno