views:

93

answers:

5

How do I use a random number generator that gives bits (0 or 1) to simulate a fair 26-sided die? I want to use a bitstream to pick letters of the English alphabet such that the odds of any one letter coming up is the same as the odds of any other letter (I know real words aren't like that and have specific frequency distributions for each letter but it doesn't matter here). What's the best way to use binary 0/1 decisions to pick letters fairly from the set A-Z? I can think of a few ways to map bits onto letters but it's not obvious to me that they won't be biased. Is there a known good way?

+1  A: 

The most simple approach in your case is to throw 5 bits, what gives 32 (0-31) equiprobable outcomes. If you get a value outside your range (greater than 25) you try again (and again...)

The average number of "coins" (bits) to throw in this case for each letter would be

 5 x 32 / 26  = 6.15

(For reference, see geometric distribution)

leonbloy
+4  A: 

If you restrict yourself to a finite number of bits and your die has 26 sides the method will always be biased. You have to allow the possibility that you will have to look at a potentially unlimited number of bits to be sure that it is unbiased.

A simple algorithm is to choose a random number between 0 and the next largest number of the form 2^n - 1 (31 in this case). If the number you randomly pick is too large, discard it and repick until you get a number in range.

Clearly this is not an optimal algorithm as you "waste" some information, but it should be good enough for most purposes. It is most wasteful if the number of sides of the die is just above 2^m for some m, for example: 33 sides. In this case you will have to discard the value almost 50% of the time.

Mark Byers
Right answer. I would add the small point that, for any five bits whose decimal equivalent is greater than 26, you can retain the least significant bit, only discard the four MSBs, and regenerate four more random bits. This saves one bit while maintaining a uniform distribution.
Steve
A: 

A naive implementation would be to combine the random bits to get a decimal or integer value, using a fixed number of bits (say, 4 bytes to get an integer). Divide the result by the max possible value for the number of bits supplied, which I think should give you a decimal evenly distributed in the range 0-1. (Esentially a rand() function). Then do 26*rand()

RMorrisey
That wouldn't be a perfectly even distribution, although it does get better the more bits you use.
David Zaslavsky
A: 

26 is 11010 in binary.
Generate five bits, if they exceed 26, either:

  1. Return the value mod 26 (Will favor the lower values)
  2. Discard the result and go again (Has the possibility to never end)

Or generalizing it:
Generate (log n in base 2) + 1 bits. If they exceed n, return the value mod n, or discard & go again.

Rubys
In what world is 1101 binary equal to 26 decimal?
Carl Norum
My bad, forgot a zero at the end of it.
Rubys
+2  A: 

The basic answer here seems right - if your random number 0..32 is greater than 25, reroll. However, you can stack the odds against an arbitrarily-long result by looking for a multiple of 26 which provides a smaller chance of going long.

 32 -  26 =  6
 64 -  52 =  12
128 -  78 =  50

... and so on. I threw together a Python script to figure out the best available number of bits up to 32, for giggles, and got this result:

2^13 - 26 * 315 = 2
2^14 - 26 * 630 = 4

So either way, you have a 1 in 2^12 chance of rerolling if you use 13 or 14 bits. Your algorithm in this case would be:

def random_character():
    r = 8190
    while r >= 8190:
        r = rand(13) # assuming rand generates an N bit integer
    return chr(r % 26 + ord('a'))

EDIT: Out of curiosity, I compared those odds with a few important values, to see if 13 was really the optimal number (assuming you can generate any number of bits, 1 to 32, in the same amount of time - if you can't, 13 bits looks like the best). Based on my (admittedly sleepy) math, if you can get 32 bits as cheaply as 16, go for that instead. Otherwise, favor 13.

2^8 through 2^12: by definition, no better than 1/2^12 odds
2^16: diff is 16, so 1/2^11
2^17: diff is 6, so slightly under 1/2^14
2^18: diff is 12, so slightly under 1/2^12
2^19: diff is 24, so slightly under 1/2^14
2^20: diff is 22, so slightly under 1/2^15
2^21: diff is 18, so slightly under 1/2^16
2^22: diff is 10, so slightly under 1/2^18
2^23: diff is 20, so slightly under 1/2^18
2^24: diff is 14, so slightly under 1/2^20
2^25: diff is 2, so 1/2^24
2^26: diff is 4, so 1/2^24
2^27: diff is 8, so 1/2^24
2^28: diff is 16, so 1/2^24
2^29: diff is 6, so slightly under 1/2^26
2^30: diff is 12, so slightly under 1/2^26
2^31: diff is 24, so slightly under 1/2^26
2^32: diff is 22, so slightly under 1/2^27
ojrac

related questions