views:

793

answers:

1

I saw this code in a comment for the article "Never-ending Shuffled Sequence". I understand the basic premise, but I don't know how it works. The biggest explanation I need is of the first two lines of the while loop.

(Because it is written in MATLAB I can only guess at how this code functions.)

probabilities = [1 1 1 1 1 1];
unrandomness = 1;
while true
    cumprob = cumsum(probabilities) ./ sum(probabilities);
    roll = find(cumprob >= rand, 1)
    probabilities = probabilities + unrandomness;
    probabilities(roll) = probabilities(roll) - 6*unrandomness;
    if min(probabilities) < 0
        probabilities = probabilities - min(probabilities);
    end
end
+11  A: 

The probabilities vector represents the relative weights for the likelihood that the numbers 1 through 6 will be selected. At the start, they all have an equal chance of being picked. I'll step through each line of the while loop explaining what it does:

  • The first line within the while loop creates a cumulative probability from the probabilities vector. The CUMSUM function is used to return a cumulative sum along the length of the vector, and this is divided by the total sum of the vector (found using the SUM function). On the first pass through the loop, cumprob will have these values:

    0.1667    0.3333    0.5000    0.6667    0.8333    1.0000
    

    Notice that these create "bins" that a random number from 0 to 1 can fall in. The probability that a number will fall within a given bin is equal to the width of that bin, so there's a 1 in 6 (0.1667) chance that a randomly drawn number will fall in the first bin (from 0 to 0.1667), or the second bin (from 0.1667 to 0.3333), etc..

  • The second line within the while loop picks a random number (using the RAND function) and finds the index of the first element in cumprob that is larger than that value (using the FIND function). The roll value is thus a number from 1 to 6.

  • The third line within the while loop adds "unrandomness" by shifting all the relative weights upward, moving the probabilities a little closer to being equal for all the numbers. Consider the example where probabilities has the following form:

    [x x x 1 x x]
    

    where x is some value greater than 1. At this point, the probability that the value 4 is chosen is 1/(5*x+1). By adding 1 to all the elements, that probability becomes 2/(5*x+7). For x = 3, the probability of 4 occurring increases from 0.0625 to 0.0909, while the probability of any other number occurring decreases from 0.1875 to 0.1818. This "unrandomness" is thus acting to normalize probabilities.

  • The fourth line within the while loop essentially does the opposite of the previous line by significantly dropping the relative weight of whatever number just occurred, making it less likely to happen on subsequent loops. This reduced likelihood of occurrence will be short lived due to the effect of the previous line constantly trying to bring the probabilities of occurrence back to equal for all the numbers.

    Note that the amount subtracted from the one element of probabilities is equal to the total amount added to all the elements in the previous line, resulting in a net change of zero for the total sum of the probabilities vector. This keeps the values in probabilities bounded so that they don't just keep growing and growing.

  • The if statement at the end of the while loop is simply there to make sure all the numbers in probabilities are positive. If the minimum value of the vector (found using the MIN function) is less than zero, then this value is subtracted from every element of the vector. This will make sure the cumprob vector always has values between 0 and 1.

If you replace the while true statement with for i = 1:6, display the probabilities vector and roll value at the end of each iteration, and run the code a few times over you can see how the code does what it does. Here's one such set of 6 rolls that draws each of the numbers 1 through 6 once:

roll             probabilities

 5   |  6     6     6     6     0     6
     |
 4   |  7     7     7     1     1     7
     |
 2   |  8     2     8     2     2     8
     |
 1   |  3     3     9     3     3     9
     |
 3   |  4     4     4     4     4    10
     |
 6   |  5     5     5     5     5     5

Notice how the final values in probabilities are all equal, meaning that at that point the numbers 1 through 6 all have an equal likelihood of being chosen once again.

gnovice
Could you go into an explanation of the first two lines within the while loop? I don't know Matlab, so `cumsum` and `find` (especially since it seems to take a boolean input) have an undefined purpose to me.
The Wicked Flea
Thanks greatly, you gave a far better explanation than I anticipated.
The Wicked Flea
Glad to help! =)
gnovice
+1 very nice explanation. I have to say the first part of the code reminds me of the roulette wheel selection in a genetic algorithm, where you want to pick individuals based on their fitness (survivor the fittest!)
Amro

related questions