views:

155

answers:

8

Let's say that I have a list of prizes:

PrizeA PrizeB PrizeC

And, for each of them, I want to draw a winner from a list of my attendees.

Give that my attendee list is as follows:

user1, user2, user3, user4, user5

What is an unbiased way to choose a user from that list?

Clearly, I will be using a cryptographically secure pseudo-random number generator, but how do I avoid a bias towards the front of the list? I assume I will not be using modulus?

EDIT
So, here is what I came up with:

class SecureRandom
{
    private RNGCryptoServiceProvider rng = new RNGCryptoServiceProvider();

    private ulong NextUlong()
    {
        byte[] data = new byte[8];
        rng.GetBytes(data);
        return BitConverter.ToUInt64(data, 0);
    }

    public int Next()
    {
        return (int)(NextUlong() % (ulong)int.MaxValue);
    }

    public int Next(int maxValue)
    {
        if (maxValue <= 0)
        {
            throw new ArgumentOutOfRangeException("maxValue");
        }

        ulong chop = ulong.MaxValue - (ulong.MaxValue % (ulong)maxValue);

        ulong rand;

        do
        {
            rand = NextUlong();
        } while (rand >= chop);

        return (int)(rand % (ulong)maxValue);
    }
}

BEWARE:

Next() Returns an int in the range [0, int.MaxValue]
Next(int.MaxValue) Returns an int in the range [0, int.MaxValue)

+3  A: 

Pseudocode for special random number generator:

rng is random number generator produces uniform integers from [0, max)
compute m = max modulo length of attendee list
do {
    draw a random number r from rng
} while(r >= max - m)
return r modulo length of attendee list

This eliminates the bias to the front part of the list. Then

put the attendees in some data structure indexable by integers
for every prize in the prize list
draw a random number r using above
compute index = r modulo length of attendee list
return the attendee at index

In C#:

public NextUnbiased(Random rg, int max) {
    do {
        int r = rg.Next();
    } while(r >= Int32.MaxValue - (Int32.MaxValue % max));
    return r % max;
}

public Attendee SelectWinner(IList<Attendee> attendees, Random rg) {    
    int winningAttendeeIndex = NextUnbiased(rg, attendees.Length)
    return attendees[winningAttendeeIndex];
}

Then:

// attendees is list of attendees
// rg is Random
foreach(Prize prize in prizes) {
    Attendee winner = SelectWinner(attendees, rg);
    Console.WriteLine("Prize {0} won by {1}", prize.ToString(), winner.ToString());
}
Jason
Does the same function exist for the PSRNG class?
John Gietzen
Also, this is tagged language-agnostic. ;)
John Gietzen
PSRNG class? Addressed the language-agnostic issue.
Jason
Sorry, CSPRNG. Like `RNGCryptoServiceProvider`.
John Gietzen
No, `RNGCryptoServiceProvider` does not have a method `Next`. But you can easily create one. Draw four random bytes (`GetBytes`) from your `RNGCryptoServiceProvider` and smash them together into an `int`.
Jason
Right, like using the `BitConverter.ToInt32(bytes)` I'm still concerned that there is a bias using the modulus, tho.
John Gietzen
Oh, i see what you did with the modulus, scraping off the top set of the numbers...
John Gietzen
Yes, exactly. I eliminated the bias by chopping off the part of the distribution that introduces the bias.
Jason
Roger, using that method.
John Gietzen
@John Gietzen: My pleasure.
Jason
It feels like a waste to throw away perfectly good entropy like that, tho.
John Gietzen
this method appears to allow 1 user to win all the prizes which may or may not be desirable. Of course you can always remove the winner after prize draw or just shuffle the attendees and draw the first N
jk
As discussed elsewhere on this post, any PRNG (rather than true RNG) will have a period smaller than the number of possible assignments of gifts. So there's still a bias, though small and harder to analyze than the usual one from just using modular arithmetic. As you said below, 'A "very minuscule" bias will not be good enough for my users.'
thouis
More on this: http://en.wikipedia.org/wiki/Shuffling#Pseudo-random_shuffles
thouis
There are RNGs with periods large enough to avoid the problems that you describe. For example, variants of the Mersenne Twister have periods as large as 2^216901 - 1.
Jason
@jk, I am using that method. Actually, it will be a configurable option. @thouis, I'm using the RNGCryptoServiceProvider which is a high-quality, non-repeating random number source.
John Gietzen
A: 

I suppose one answer would be to assign each item a random value, and take the largest or smallest, drilling down as necessary.

I'm not sure if this is the most efficient, tho...

John Gietzen
A: 

If you're using a good number generator, even with a modulus your bias will be miniscule. If, for instance, you're using a random number generator with 64 bits of entropy and five users, your bias toward the front of the array should be on the order of 3x10^-19 (my numbers may be off, by I don't think by much). That's an extra 3-in-10-quintillion likelihood of the first user winning compared to the later users. That should be good enough to be fair in anyone's book.

Dathan
A "very minuscule" bias will not be good enough for my users.
John Gietzen
+1  A: 

You can buy truly random bits from a provider, or use a mechanical device.

Hamish Grubijan
A: 

Here you will find Oleg Kiselyov's discussion of purely functional random shuffling.

A description of the linked content (quoted from the beginning of that article):

This article will give two pure functional programs that perfectly, randomly and uniformly shuffle a sequence of arbitrary elements. We prove that the algorithms are correct. The algorithms are implemented in Haskell and can trivially be re-written into other (functional) languages. We also discuss why a commonly used sort-based shuffle algorithm falls short of perfect shuffling.

You could use that to shuffle your list and then pick the first item of the shuffled result (or maybe you'd prefer not to give two prizes two the same person -- then use n initial positions of the result, for n = number of prizes); or you could simplify the algorithm to just produce the first item; or you could take a look around that site, because I could have sworn there's an article on picking one random element from an arbitrary tree-like structure with uniform distribution, in a purely functional way, proof of correctness provided, but my search-fu is failing me and I can't seem to find it.

Michał Marczyk
+1  A: 

Assuming a fairly distributed random number generator...

do {
    i = rand();
} while (i >= RAND_MAX / 5 * 5);
i /= 5;

This gives each of 5 slots

[ 0 .. RAND_MAX / 5 )
[ RAND_MAX / 5 .. RAND_MAX / 5 * 2 )
[ RAND_MAX / 5 * 2 .. RAND_MAX / 5 * 3 )
[ RAND_MAX / 5 * 3 .. RAND_MAX / 5 * 4 )
[ RAND_MAX / 5 * 4 .. RAND_MAX / 5 * 5 )

and discards a roll which falls out of range.

ephemient
A: 

Without truly random bits, you will always have some bias. The number of ways to assign prizes to guests is much larger than any common PRNG's period for even a fairly low number of guests and prizes. As suggested by lpthnc, buy some truly random bits, or buy some random-bit-generating hardware.

As for the algorithm, just do a random shuffle of the guest list. Be careful, as naive shuffling algorithms do have a bias: http://en.wikipedia.org/wiki/Shuffling#Shuffling%5Falgorithms

thouis
Truly random - has to be truly random and UNIFORM, not Gaussian.
Hamish Grubijan
I'm not sure what you're referring to. Truly random bits are not Gaussian. http://en.wikipedia.org/wiki/Hardware_random_number_generator
thouis
A: 

You have already seem several perfectly good answers that depend on knowing the length of the list in advance.

To fairly select a single item from a list without needing to know the length of the list in the first place do this:

 if (list.empty()) error_out_somehow
 r=list.first()          // r is a reference or pointer
 s=list.first()          // so is s
 i = 2
 while (r.next() is not NULL)
    r=r.next()
    if (random(i)==0) s=r  // random() returns a uniformly
                           // drawn integer between 0 and i
    i++
 return s

(Useful if you list is stored as a linked list)


To distribute prizes in this scenario, just walk down the list of prizes selecting a random winner for each one. (If you want to prevent double winning you then remove the winner from the participant list.)


Why does it work?

  1. You start with the first item at 1/1
  2. On the next pass, you select the second item half the time (1/2), which means that the first item has probability 1 * (2-1)/2 = 1/2
  3. on further iteration, you select the nth item with probability 1/n, and the chance for each previous item is reduced by a factor of (n-1)/n

which means that when you come to the end, the chance of having the mth item in the list (of n items) is

1/m * m/(m+1) * (m+1)/(m+2) * ... * (n-2)/(n-1) * (n-1)/n = 1/n

and is the same for every item.


If you are paying attention, you'll note that this means walking the whole list every time you want to select an item from the list, so this is not maximally efficient for (say) reordering the whole list (though it does that fairly).

dmckee