- What is simple solution
- What is effective solution to less minimum memory and(or) cpu speed?
views:
13592answers:
39int rand7( void )
{
int i;
do i=rand5()+rand5()-1; while( i > 7 );
return( i );
}
EDIT: Wrong, doesn't produce a uniform distribution.
EDIT2: Double-wrong, won't produce 1 (now fixed)
Are homework problems allowed here?
This function does crude "base 5" math to generate a number between 0 and 6.
function rnd7() {
do {
r1 = rnd5() - 1;
do {
r2=rnd5() - 1;
} while (r2 > 1);
result = r2 * 5 + r1;
} while (result > 6);
return result + 1;
}
There is no (exactly correct) solution which will run in a constant amount of time, since 1/7 is an infinite decimal in base 5. One simple solution would be to use rejection sampling, e.g.:
int i;
do
{
i = 5 * (rand5() - 1) + rand5(); // i is now uniformly random between 1 and 25
} while(i > 21);
// i is now uniformly random between 1 and 21
return i % 7 + 1; // result is now uniformly random between 1 and 7
This has an expected runtime of 25/21 = 1.19 iterations of the loop, but there is an infinitesimally small probability of looping forever.
int ans = 0;
while (ans == 0)
{
for (int i=0; i<3; i++)
{
while ((r = rand5()) == 3){};
ans += (r < 3) >> i
}
}
// Return a random value between 1 and 7
// In case of complaints of non random results just
// sigh, roll your eyes and point out if it returned
// the values the user expected it wouldn't be random
// would it?
int random1_to_7()
{
return random1_to_5();
}
rand7() = (rand5()+rand5()+rand5()+rand5()+rand5()+rand5()+rand5())%7+1
Edit: That doesn't quite work. It's off by about 2 parts in 1000. The buckets get:
value Count Error%
1 11158 -0.0035
2 11144 -0.0214
3 11144 -0.0214
4 11158 -0.0035
5 11172 +0.0144
6 11177 +0.0208
7 11172 +0.0144
By switching to a sum of
n Error%
10 +/- 1e-3,
12 +/- 1e-4,
14 +/- 1e-5,
16 +/- 1e-6,
...
28 +/- 3e-11
seems to gain an order of magnitude for every 2 added
int randbit( void )
{
while( 1 )
{
int r = rand5();
if( r <= 4 ) return(r & 1);
}
}
int randint( int nbits )
{
int result = 0;
while( nbits-- )
{
result = (result<<1) | randbit();
}
return( result );
}
int rand7( void )
{
while( 1 )
{
int r = randint( 3 ) + 1;
if( r <= 7 ) return( r );
}
}
solution in php
<?php
function random_5(){
return rand(1,5);
}
function random_7(){
$total = 0;
for($i=0;$i<7;$i++){
$total += random_5();
}
return ($total%7)+1;
}
echo random_7();
?>
The following produces a uniform distribution on {1, 2, 3, 4, 5, 6, 7} using a random number generator producing a uniform distribution on {1, 2, 3, 4, 5}. The code is messy, but the logic is clear.
public static int random_7(Random rg) {
int returnValue = 0;
while (returnValue == 0) {
for (int i = 1; i <= 3; i++) {
returnValue = (returnValue << 1) + SimulateFairCoin(rg);
}
}
return returnValue;
}
private static int SimulateFairCoin(Random rg) {
while (true) {
int flipOne = random_5_mod_2(rg);
int flipTwo = random_5_mod_2(rg);
if (flipOne == 0 && flipTwo == 1) {
return 0;
}
else if (flipOne == 1 && flipTwo == 0) {
return 1;
}
}
}
private static int random_5_mod_2(Random rg) {
return random_5(rg) % 2;
}
private static int random_5(Random rg) {
return rg.Next(5) + 1;
}
For number 1, can someone explain what's wrong with this?
public class Sandbox {
private Random random = new Random();
public static void main(String[] args) {
Sandbox sb = new Sandbox();
sb.go();
}
private void go() {
int [] places = new int[5];
for (int i = 0; i < 10000000; i++) {
int result = rand5();
places[result] = places[result] + 1;
}
for (int i = 0; i < places.length; i++) {
int place = places[i];
System.out.println("#" + i + " = " + place);
}
}
public int rand7() {
return random.nextInt(7);
}
public int rand5() {
int r = rand7();
switch (r) {
case 0:
case 1:
case 2:
case 3:
case 4:
return r;
case 5:
case 6:
return rand5();
default:
throw new IllegalStateException(r + "");
}
}
}
Assuming that rand(n) here means "random integer in a uniform distribution from 0 to n-1", here's a code sample using Python's randint, which has that effect. It uses only randint(5), and constants, to produce the effect of randint(7). A little silly, actually
from random import randint
sum = 7
while sum >= 7:
first = randint(0,5)
toadd = 9999
while toadd>1:
toadd = randint(0,5)
if toadd:
sum = first+5
else:
sum = first
assert 7>sum>=0
print sum
(I have stolen Adam Rosenfeld's answer and made it run about 7% faster.)
Assume that rand5() returns one of {0,1,2,3,4} with equal distribution and the goal is return {0,1,2,3,4,5,6} with equal distribution.
int rand7() {
i = 5 * rand5() + rand5();
max = 25;
//i is uniform among {0 ... max-1}
while(i < max%7) {
//i is uniform among {0 ... (max%7 - 1)}
i *= 5;
i += rand5(); //i is uniform {0 ... (((max%7)*5) - 1)}
max %= 7;
max *= 5; //once again, i is uniform among {0 ... max-1}
}
return(i%7);
}
We're keeping track of the largest value that the loop can make in the variable max
. If the reult so far is between max%7 and max-1 then the result will be uniformly distrubuted in that range. If not, we use the remainder, which is random between 0 and max%7-1, and another call to rand() to make a new number and a new max. Then we start again.
Edit: Expect number of times to call rand5() is x in this equation:
x = 2 * 21/25
+ 3 * 4/25 * 14/20
+ 4 * 4/25 * 6/20 * 28/30
+ 5 * 4/25 * 6/20 * 2/30 * 7/10
+ 6 * 4/25 * 6/20 * 2/30 * 3/10 * 14/15
+ (6+x) * 4/25 * 6/20 * 2/30 * 3/10 * 1/15
x = about 2.21 calls to rand5()
A constant time solution that produces approximately uniform distribution. The trick is 625 happens to be cleanly divisible by 7 and you can get uniform distributions as you build up to that range.
Edit: My bad, I miscalculated, but instead of pulling it I'll leave it in case someone finds it useful/entertaining. It does actually work after all... :)
int rand5()
{
return (rand() % 5) + 1;
}
int rand25()
{
return (5 * (rand5() - 1) + rand5());
}
int rand625()
{
return (25 * (rand25() - 1) + rand25());
}
int rand7()
{
return ((625 * (rand625() - 1) + rand625()) - 1) % 7 + 1;
}
int rand7()
{
int zero_one_or_two = ( rand5() + rand5() - 1 ) % 3 ;
return rand5() + zero_one_or_two ;
}
I feel stupid in front of all this complicated answsers.
Why can't it be :
int random1_to_7()
{
return (random1_to_5() * 7) / 5;
}
?
#!/usr/bin/env ruby
class Integer
def rand7
rand(6)+1
end
end
def rand5
rand(4)+1
end
x = rand5() # x => int between 1 and 5
y = x.rand7() # y => int between 1 and 7
..although that may possibly be considered cheating..
I have played around and I write "testing environment" for this Rand(7) algorithm. For example if you want to try what distribution gives your algorithm or how much iterations takes to generate all distinct random values (for Rand(7) 1-7), you can use it.
My core algorithm is this:
return (Rand5() + Rand5()) % 7 + 1;
Well is no less uniformly distributed then Adam Rosenfield's one. (which I included in my snippet code)
private static int Rand7WithRand5()
{
//PUT YOU FAVOURITE ALGORITHM HERE//
//1. Stackoverflow winner
int i;
do
{
i = 5 * (Rand5() - 1) + Rand5(); // i is now uniformly random between 1 and 25
} while (i > 21);
// i is now uniformly random between 1 and 21
return i % 7 + 1;
//My 2 cents
//return (Rand5() + Rand5()) % 7 + 1;
}
This "testing environment" can take any Rand(n) algorithm and test and evaluate it (distribution and speed). Just put your code into the "Rand7WithRand5" method and run the snippet.
Few observations:
- Adam Rosenfield's algorithm is no better distributed then, for example, mine. Anyway, both algorithms distribution is horrible.
- Native Rand7 (
random.Next(1, 8)
) is completed as it generated all members in given interval in around 200+ iterations, Rand7WithRand5 algorithms take order of 10k (around 30-70k) - Real challenge is not to write a method to generate Rand(7) from Rand(5), but it generate values more or less uniformly distributed.
Here is a working Python implementation of Adam's answer.
import random
def rand5():
return random.randint(1, 5)
def rand7():
while True:
r = 5 * (rand5() - 1) + rand5()
#r is now uniformly random between 1 and 25
if (r <= 21):
break
#result is now uniformly random between 1 and 7
return r % 7 + 1
I like to throw algorithms I'm looking at into Python so I can play around with them, thought I'd post it here in the hopes that it is useful to someone out there, not that it took long to throw together.
By using a rolling total, you can both
- maintain an equal distribution; and
- not have to sacrifice any element in the random sequence.
Both these problems are an issue with the simplistic rand(5)+rand(5)...
-type solutions. The following Python code shows how to implement it (most of this is proving the distribution).
import random
x = []
for i in range (0,7):
x.append (0)
t = 0
tt = 0
for i in range (0,700000):
########################################
##### qq.py #####
r = int (random.random () * 5)
t = (t + r) % 7
########################################
##### qq_notsogood.py #####
#r = 20
#while r > 6:
#r = int (random.random () * 5)
#r = r + int (random.random () * 5)
#t = r
########################################
x[t] = x[t] + 1
tt = tt + 1
high = x[0]
low = x[0]
for i in range (0,7):
print "%d: %7d %.5f" % (i, x[i], 100.0 * x[i] / tt)
if x[i] < low:
low = x[i]
if x[i] > high:
high = x[i]
diff = high - low
print "Variation = %d (%.5f%%)" % (diff, 100.0 * diff / tt)
And this output shows the results:
pax$ python qq.py
0: 99908 14.27257
1: 100029 14.28986
2: 100327 14.33243
3: 100395 14.34214
4: 99104 14.15771
5: 99829 14.26129
6: 100408 14.34400
Variation = 1304 (0.18629%)
pax$ python qq.py
0: 99547 14.22100
1: 100229 14.31843
2: 100078 14.29686
3: 99451 14.20729
4: 100284 14.32629
5: 100038 14.29114
6: 100373 14.33900
Variation = 922 (0.13171%)
pax$ python qq.py
0: 100481 14.35443
1: 99188 14.16971
2: 100284 14.32629
3: 100222 14.31743
4: 99960 14.28000
5: 99426 14.20371
6: 100439 14.34843
Variation = 1293 (0.18471%)
A simplistic rand(5)+rand(5)
, ignoring those cases where this returns more than 6 has a typical variation of 18%, 100 times that of the method shown above:
pax$ python qq_notsogood.py
0: 31756 4.53657
1: 63304 9.04343
2: 95507 13.64386
3: 127825 18.26071
4: 158851 22.69300
5: 127567 18.22386
6: 95190 13.59857
Variation = 127095 (18.15643%)
pax$ python qq_notsogood.py
0: 31792 4.54171
1: 63637 9.09100
2: 95641 13.66300
3: 127627 18.23243
4: 158751 22.67871
5: 126782 18.11171
6: 95770 13.68143
Variation = 126959 (18.13700%)
pax$ python qq_notsogood.py
0: 31955 4.56500
1: 63485 9.06929
2: 94849 13.54986
3: 127737 18.24814
4: 159687 22.81243
5: 127391 18.19871
6: 94896 13.55657
Variation = 127732 (18.24743%)
And, on the advice of Nixuz, I've cleaned the script up so you can just extract and use the rand7...
stuff:
import random
# rand5() returns 0 through 4 inclusive.
def rand5():
return int (random.random () * 5)
# rand7() generator returns 0 through 6 inclusive (using rand5()).
def rand7():
rand7ret = 0
while True:
rand7ret = (rand7ret + rand5()) % 7
yield rand7ret
# Number of test runs.
count = 700000
# Work out distribution.
distrib = [0,0,0,0,0,0,0]
rgen =rand7()
for i in range (0,count):
r = rgen.next()
distrib[r] = distrib[r] + 1
# Print distributions and calculate variation.
high = distrib[0]
low = distrib[0]
for i in range (0,7):
print "%d: %7d %.5f" % (i, distrib[i], 100.0 * distrib[i] / count)
if distrib[i] < low:
low = distrib[i]
if distrib[i] > high:
high = distrib[i]
diff = high - low
print "Variation = %d (%.5f%%)" % (diff, 100.0 * diff / count)
This is equivalent to Adam Rosenfield's solution, but may be a bit more clear for some readers. It assumes rand5() is a function that returns a statistically random integer in the range 1 through 5 inclusive.
int rand7()
{
int vals[5][5] = {
{ 1, 2, 3, 4, 5 },
{ 6, 7, 1, 2, 3 },
{ 4, 5, 6, 7, 1 },
{ 2, 3, 4, 5, 6 },
{ 7, 0, 0, 0, 0 }
};
int result = 0;
while (result == 0)
{
int i = rand5();
int j = rand5();
result = vals[i-1][j-1];
}
return result;
}
How does it work? Think of it like this: imagine printing out this double-dimension array on paper, tacking it up to a dart board and randomly throwing darts at it. If you hit a non-zero value, it's a statistically random value between 1 and 7, since there are an equal number of non-zero values to choose from. If you hit a zero, just keep throwing the dart until you hit a non-zero. That's what this code is doing: the i and j indexes randomly select a location on the dart board, and if we don't get a good result, we keep throwing darts.
Like Adam said, this can run forever in the worst case, but statistically the worst case never happens. :)
If we consider the additional constraint of trying to give the most efficient answer i.e one that given an input stream, I, of uniformly distributed integers of length m from 1-5 outputs a stream O, of uniformly distributed integers from 1-7 of the longest length relative to m, say L(m).
The simplest way to analyse this is to treat the streams I and O as 5-ary and 7-ary numbers respectively. This is achieved by the main answer's idea of taking the stream a1, a2, a3,... -> a1+5*a2+5^2*a3+.. and similarly for stream O.
Then if we take a section of the input stream of length m choose n s.t. 5^m-7^n=c where c>0 and is as small as possible. Then there is a uniform map from the input stream of length m to integers from 1 to 5^m and another uniform map from integers from 1 to 7^n to the output stream of length n where we may have to lose a few cases from the input stream when the mapped integer exceeds 7^n.
So this gives a value for L(m) of around m (log5/log7) which is approximately .82m.
The difficulty with the above analysis is the equation 5^m-7^n=c which is not easy to solve exactly and the case where the uniform value from 1 to 5^m exceeds 7^n and we lose efficiency.
The question is how close to the best possible value of m (log5/log7) can be attain. For example when this number approaches close to an integer can we find a way to achieve this exact integral number of output values?
If 5^m-7^n=c then from the input stream we effectively generate a uniform random number from 0 to (5^m)-1 and don't use any values higher than 7^n. However these values can be rescued and used again. They effectively generate a uniform sequence of numbers from 1 to 5^m-7^n. So we can then try to use these and convert them into 7-ary numbers so that we can create more output values.
If we let T7(X) to be the average length of the output sequence of random(1-7) integers derived from a uniform input of size X, and assuming that 5^m=7^n0+7^n1+7^n2+...+7^nr+s, s<7.
Then T7(5^m)=n0x7^n0/5^m + ((5^m-7^n0)/5^m) T7(5^m-7^n0) since we have a length no sequence with probability 7^n0/5^m with a residual of length 5^m-7^n0 with probability (5^m-7^n0)/5^m).
If we just keep substituting we obtain:
T7(5^m) = n0x7^n0/5^m + n1x7^n1/5^m + ... + nrx7^nr/5^m = (n0x7^n0 + n1x7^n1 + ... + nrx7^nr)/5^m
Hence L(m)=T7(5^m)=(n0x7^n0 + n1x7^n1 + ... + nrx7^nr)/(7^n0+7^n1+7^n2+...+7^nr+s)
Another way of putting this is:
If 5^m has 7-ary representation a0+a1*7 + a2*7^2 + a3*7^3+...+ar*7^r Then L(m) = (a1*7 + 2a2*7^2 + 3a3*7^3+...+rar*7^r)/(a0+a1*7 + a2*7^2 + a3*7^3+...+ar*7^r)
The best possible case is my original one above where 5^m=7^n+s, where s<7.
Then T7(5^m) = nx(7^n)/(7^n+s) = n+o(1) = m (Log5/Log7)+o(1) as before.
The worst case is when we can only find k and s.t 5^m = kx7+s.
Then T7(5^m) = 1x(k.7)/(k.7+s) = 1+o(1).
Other cases are somewhere inbetween. It would be interesting to see how well we can do for very large m, i.e. how good can we get the error term:
T7(5^m) = m (Log5/Log7)+e(m).
It seems impossible to achieve e(m) = o(1) in general but hopefully we can prove e(m)=o(m).
The whole thing then rests on the distribution of the 7-ary digits of 5^m for various values of m.
I'm sure there is a lot of theory out there that covers this I may have a look and report back at some point.
This answer is more an experiment in obtaining the most entropy possible from the Rand5 function. t is therefore somewhat unclear and almost certainly a lot slower than other implementations.
Assuming the uniform distribution from 0-4 and resulting uniform distribution from 0-6:
public class SevenFromFive
{
public SevenFromFive()
{
// this outputs a uniform ditribution but for some reason including it
// screws up the output distribution
// open question Why?
this.fifth = new ProbabilityCondensor(5, b => {});
this.eigth = new ProbabilityCondensor(8, AddEntropy);
}
private static Random r = new Random();
private static uint Rand5()
{
return (uint)r.Next(0,5);
}
private class ProbabilityCondensor
{
private readonly int samples;
private int counter;
private int store;
private readonly Action<bool> output;
public ProbabilityCondensor(int chanceOfTrueReciprocal,
Action<bool> output)
{
this.output = output;
this.samples = chanceOfTrueReciprocal - 1;
}
public void Add(bool bit)
{
this.counter++;
if (bit)
this.store++;
if (counter == samples)
{
bool? e;
if (store == 0)
e = false;
else if (store == 1)
e = true;
else
e = null;// discard for now
counter = 0;
store = 0;
if (e.HasValue)
output(e.Value);
}
}
}
ulong buffer = 0;
const ulong Mask = 7UL;
int bitsAvail = 0;
private readonly ProbabilityCondensor fifth;
private readonly ProbabilityCondensor eigth;
private void AddEntropy(bool bit)
{
buffer <<= 1;
if (bit)
buffer |= 1;
bitsAvail++;
}
private void AddTwoBitsEntropy(uint u)
{
buffer <<= 2;
buffer |= (u & 3UL);
bitsAvail += 2;
}
public uint Rand7()
{
uint selection;
do
{
while (bitsAvail < 3)
{
var x = Rand5();
if (x < 4)
{
// put the two low order bits straight in
AddTwoBitsEntropy(x);
fifth.Add(false);
}
else
{
fifth.Add(true);
}
}
// read 3 bits
selection = (uint)((buffer & Mask));
bitsAvail -= 3;
buffer >>= 3;
if (selection == 7)
eigth.Add(true);
else
eigth.Add(false);
}
while (selection == 7);
return selection;
}
}
The number of bits added to the buffer per call to Rand5 is currently 4/5 * 2 so 1.6. If the 1/5 probability value is included that increases by 0.05 so 1.65 but see the comment in the code where I have had to disable this.
Bits consumed by call to Rand7 = 3 + 1/8 * (3 + 1/8 * (3 + 1/8 * (...
This is 3 + 3/8 + 3/64 + 3/512 ... so approx 3.42
By extracting information from the sevens I reclaim 1/8*1/7 bits per call so about 0.018
This gives a net consumption 3.4 bits per call which means the ratio is 2.125 calls to Rand5 for every Rand7. The optimum should be 2.1.
I would imagine this approach is significantly slower than many of the other ones here unless the cost of the call to Rand5 is extremely expensive (say calling out to some external source of entropy).
I'd like to add another answer, in addition to my first answer. This answer attempts to minimize the number of calls to rand5()
per call to rand7()
, to maximize the usage of randomness. That is, if you consider randomness to be a precious resource, we want to use as much of it as possible, without throwing away any random bits. This answer also has some similarities with the logic presented in Ivan's answer.
The entropy of a random variable is a well-defined quantity. For a random variable which takes on N states with equal probabilities (a uniform distribution), the entropy is log2 N. Thus, rand5()
has approximately 2.32193 bits of entropy, and rand7()
has about 2.80735 bits of entropy. If we hope to maximize our use of randomness, we need to use all 2.32193 bits of entropy from each call to rand5()
, and apply them to generating 2.80735 bits of entropy needed for each call to rand7()
. The fundamental limit, then, is that we can do no better than log(7)/log(5) = 1.20906 calls to rand5()
per call to rand7()
.
Side notes: all logarithms in this answer will be base 2 unless specified otherwise. rand5()
will be assumed to return numbers in the range [0, 4], and rand7()
will be assumed to return numbers in the range [0, 6]. Adjusting the ranges to [1, 5] and [1, 7] respectively is trivial.
So how do we do it? We generate an infinitely precise random real number between 0 and 1 (pretend for the moment that we could actually compute and store such an infinitely precise number -- we'll fix this later). We can generate such a number by generating its digits in base 5: we pick the random number 0.a
1a
2a
3..., where each digit ai
is chosen by a call to rand5()
. For example, if our RNG chose ai
= 1 for all i
, then ignoring the fact that that isn't very random, that would correspond to the real number 1/5 + 1/52 + 1/53 + ... = 1/4 (sum of a geometric series).
Ok, so we've picked a random real number between 0 and 1. I now claim that such a random number is uniformly distributed. Intuitively, this is easy to understand, since each digit was picked uniformly, and the number is infinitely precise. However, a formal proof of this is somewhat more involved, since now we're dealing with a continuous distribution instead of a discrete distribution, so we need to prove that the probability that our number lies in an interval [a
, b
] equals the length of that interval, b - a
. The proof is left as an exercise for the reader =).
Now that we have a random real number selected uniformly from the range [0, 1], we need to convert it to a series of uniformly random numbers in the range [0, 6] to generate the output of rand7()
. How do we do this? Just the reverse of what we just did -- we convert it to an infinitely precise decimal in base 7, and then each base 7 digit will correspond to one output of rand7()
.
Taking the example from earlier, if our rand5()
produces an infinite stream of 1's, then our random real number will be 1/4. Converting 1/4 to base 7, we get the infinite decimal 0.15151515..., so we will produce as output 1, 5, 1, 5, 1, 5, etc.
Ok, so we have the main idea, but we have two problems left: we can't actually compute or store an infinitely precise real number, so how do we deal with only a finite portion of it? Secondly, how do we actually convert it to base 7?
One way we can convert a number between 0 and 1 to base 7 is as follows:
- Multiply by 7
- The integral part of the result is the next base 7 digit
- Subtract off the integral part, leaving only the fractional part
- Goto step 1
To deal with the problem of infinite precision, we compute a partial result, and we also store an upper bound on what the result could be. That is, suppose we've called rand5()
twice and it returned 1 both times. The number we've generated so far is 0.11 (base 5). Whatever the rest of the infinite series of calls to rand5()
produce, the random real number we're generating will never be larger than 0.12: it is always true that 0.11 ≤ 0.11xyz... < 0.12.
So, keeping track of the current number so far, and the maximum value it could ever take, we convert both numbers to base 7. If they agree on the first k
digits, then we can safely output the next k
digits -- regardless of what the infinite stream of base 5 digits are, they will never affect the next k
digits of the base 7 representation!
And that's the algorithm -- to generate the next output of rand7()
, we generate only as many digits of rand5()
as we need to ensure that we know with certainty the value of the next digit in the conversion of the random real number to base 7. Here is a Python implementation, with a test harness:
import random
rand5_calls = 0
def rand5():
global rand5_calls
rand5_calls += 1
return random.randint(0, 4)
def rand7_gen():
state = 0
pow5 = 1
pow7 = 7
while True:
if state / pow5 == (state + pow7) / pow5:
result = state / pow5
state = (state - result * pow5) * 7
pow7 *= 7
yield result
else:
state = 5 * state + pow7 * rand5()
pow5 *= 5
if __name__ == '__main__':
r7 = rand7_gen()
N = 10000
x = list(next(r7) for i in range(N))
distr = [x.count(i) for i in range(7)]
expmean = N / 7.0
expstddev = math.sqrt(N * (1.0/7.0) * (6.0/7.0))
print '%d TRIALS' % N
print 'Expected mean: %.1f' % expmean
print 'Expected standard deviation: %.1f' % expstddev
print
print 'DISTRIBUTION:'
for i in range(7):
print '%d: %d (%+.3f stddevs)' % (i, distr[i], (distr[i] - expmean) / expstddev)
print
print 'Calls to rand5: %d (average of %f per call to rand7)' % (rand5_calls, float(rand5_calls) / N)
Note that rand7_gen()
returns a generator, since it has internal state involving the conversion of the number to base 7. The test harness calls next(r7)
10000 times to produce 10000 random numbers, and then it measures their distribution. Only integer math is used, so the results are exactly correct.
Also note that the numbers here get very big, very fast. Powers of 5 and 7 grow quickly. Hence, performance will start to degrade noticeably after generating lots of random numbers, due to bignum arithmetic. But remember here, my goal was to maximize the usage of random bits, not to maximize performance (although that is a secondary goal).
In one run of this, I made 12091 calls to rand5()
for 10000 calls to rand7()
, achieving the minimum of log(7)/log(5) calls on average to 4 significant figures, and the resulting output was uniform.
In order to port this code to a language that doesn't have arbitrarily large integers built-in, you'll have to cap the values of pow5
and pow7
to the maximum value of your native integral type -- if they get too big, then reset everything and start over. This will increase the average number of calls to rand5()
per call to rand7()
very slightly, but hopefully it shouldn't increase too much even for 32- or 64-bit integers.
The premise behind Adam Rosenfield's correct answer is:
- x = 5^n (in his case: n=2)
- manipulate n rand5 calls to get a number y within range [1, x]
- z = ((int)(x / 7)) * 7
- if y > z, try again. else return y % 7 + 1
When n equals 2, you have 4 throw-away possibilities: y = {22, 23, 24, 25}. If you use n equals 6, you only have 1 throw-away: y = {15625}.
5^6 = 15625
7 * 2232 = 15624
You call rand5 more times. However, you have a much lower chance of getting a throw-away value (or an infinite loop). If there is a way to get no possible throw-away value for y, I haven't found it yet.
Here's my answer:
static struct rand_buffer {
unsigned v, count;
} buf2, buf3;
void push (struct rand_buffer *buf, unsigned n, unsigned v)
{
buf->v = buf->v * n + v;
++buf->count;
}
#define PUSH(n, v) push (&buf##n, n, v)
int rand16 (void)
{
int v = buf2.v & 0xf;
buf2.v >>= 4;
buf2.count -= 4;
return v;
}
int rand9 (void)
{
int v = buf3.v % 9;
buf3.v /= 9;
buf3.count -= 2;
return v;
}
int rand7 (void)
{
if (buf3.count >= 2) {
int v = rand9 ();
if (v < 7)
return v % 7 + 1;
PUSH (2, v - 7);
}
for (;;) {
if (buf2.count >= 4) {
int v = rand16 ();
if (v < 14) {
PUSH (2, v / 7);
return v % 7 + 1;
}
PUSH (2, v - 14);
}
// Get a number between 0 & 25
int v = 5 * (rand5 () - 1) + rand5 () - 1;
if (v < 21) {
PUSH (3, v / 7);
return v % 7 + 1;
}
v -= 21;
PUSH (2, v & 1);
PUSH (2, v >> 1);
}
}
It's a little more complicated than others, but I believe it minimises the calls to rand5. As with other solutions, there's a small probability that it could loop for a long time.
There are elegant algorithms cited above, but here's one way to approach it, although it might be roundabout. I am assuming values generated from 0.
R2 = random number generator giving values less than 2 (sample space = {0, 1})
R8 = random number generator giving values less than 8 (sample space = {0, 1, 2, 3, 4, 5, 6, 7})
In order to generate R8 from R2, you will run R2 thrice, and use the combined result of all 3 runs as a binary number with 3 digits. Here are the range of values when R2 is ran thrice:
0 0 0 --> 0
.
.
1 1 1 --> 7
Now to generate R7 from R8, we simply run R7 again if it returns 7:
int R7() {
do {
x = R8();
} while (x > 6)
return x;
}
The roundabout solution is to generate R2 from R5 (just like we generated R7 from R8), then R8 from R2 and then R7 from R8.
Why not do it simple?
int random7() {
return random5() + (random5() % 3);
}
The chances of getting 1 and 7 in this solution is lower due to the modulo, however, if you just want a quick and readable solution, this is the way to go.
As long as there aren't seven possibilities left to choose from, draw another random number, which multiplies the number of possibilities by five. In Perl:
$num = 0;
$possibilities = 1;
sub rand7
{
while( $possibilities < 7 )
{
$num = $num * 5 + int(rand(5));
$possibilities *= 5;
}
my $result = $num % 7;
$num = int( $num / 7 );
$possibilities /= 7;
return $result;
}
The function you need is *rand1_7()*, I wrote rand1_5() so that you can test it and plot it.
import numpy
def rand1_5():
return numpy.random.randint(5)+1
def rand1_7():
q = 0
for i in xrange(7): q+= rand1_5()
return q%7 + 1
There you go, uniform distribution and zero rand5 calls.
def rand7:
seed += 1
if seed >= 7:
seed = 0
yield seed
Need to set seed beforehand.
Here's a solution that fits entirely within integers and is within about 4% of optimal (i.e. uses 1.26 random numbers in {0..4} for every one in {0..6}). The code's in Scala, but the math should be reasonably clear in any language: you take advantage of the fact that 7^9 + 7^8 is very close to 5^11. So you pick an 11 digit number in base 5, and then interpret it as a 9 digit number in base 7 if it's in range (giving 9 base 7 numbers), or as an 8 digit number if it's over the 9 digit number, etc.:
abstract class RNG {
def apply(): Int
}
class Random5 extends RNG {
val rng = new scala.util.Random
var count = 0
def apply() = { count += 1 ; rng.nextInt(5) }
}
class FiveSevener(five: RNG) {
val sevens = new Array[Int](9)
var nsevens = 0
val to9 = 40353607;
val to8 = 5764801;
val to7 = 823543;
def loadSevens(value: Int, count: Int) {
nsevens = 0;
var remaining = value;
while (nsevens < count) {
sevens(nsevens) = remaining % 7
remaining /= 7
nsevens += 1
}
}
def loadSevens {
var fivepow11 = 0;
var i=0
while (i<11) { i+=1 ; fivepow11 = five() + fivepow11*5 }
if (fivepow11 < to9) { loadSevens(fivepow11 , 9) ; return }
fivepow11 -= to9
if (fivepow11 < to8) { loadSevens(fivepow11 , 8) ; return }
fivepow11 -= to8
if (fivepow11 < 3*to7) loadSevens(fivepow11 % to7 , 7)
else loadSevens
}
def apply() = {
if (nsevens==0) loadSevens
nsevens -= 1
sevens(nsevens)
}
}
If you paste a test into the interpreter (REPL actually), you get:
scala> val five = new Random5
five: Random5 = Random5@e9c592
scala> val seven = new FiveSevener(five)
seven: FiveSevener = FiveSevener@143c423
scala> val counts = new Array[Int](7)
counts: Array[Int] = Array(0, 0, 0, 0, 0, 0, 0)
scala> var i=0 ; while (i < 100000000) { counts( seven() ) += 1 ; i += 1 }
i: Int = 100000000
scala> counts
res0: Array[Int] = Array(14280662, 14293012, 14281286, 14284836, 14287188,
14289332, 14283684)
scala> five.count
res1: Int = 125902876
The distribution is nice and flat (within about 10k of 1/7 of 10^8 in each bin, as expected from an approximately-Gaussian distribution).
I know it has been answered, but is this seems to work ok, but I can not tell you if it has a bias. My 'testing' suggests it is, at least, reasonable.
Perhaps Adam Rosenfield would be kind enough to comment?
My (naive?) idea is this:
Accumulate rand5's until there is enough random bits to make a rand7. This takes at most 2 rand5's. To get the rand7 number I use the accumulated value mod 7.
To avoid the accumulator overflowing, and since the accumulator is mod 7 then I take the mod 7 of the accumulator:
(5a + rand5) % 7 = (k*7 + (5a%7) + rand5) % 7 = ( (5a%7) + rand5) % 7
The rand7() function follows:
(I let the range of rand5 be 0-4 and rand7 is likewise 0-6.)
int rand7(){
static int a=0;
static int e=0;
int r;
a = a * 5 + rand5();
e = e + 5; // added 5/7ths of a rand7 number
if ( e<7 ){
a = a * 5 + rand5();
e = e + 5; // another 5/7ths
}
r = a % 7;
e = e - 7; // removed a rand7 number
a = a % 7;
return r;
}
Edit: Added results for 100 million trials.
'Real' rand functions mod 5 or 7
rand5 : avg=1.999802 0:20003944 1:19999889 2:20003690 3:19996938 4:19995539 rand7 : avg=3.000111 0:14282851 1:14282879 2:14284554 3:14288546 4:14292388 5:14288736 6:14280046
My rand7
Average looks ok and number distributions look ok too.
randt : avg=3.000080 0:14288793 1:14280135 2:14287848 3:14285277 4:14286341 5:14278663 6:14292943
just scale your output from your first function
0) you have a number in range 1-5
1) subtract 1 to make it in range 0-4
2) multiply by (7-1)/(5-1) to make it in range 0-6
3) add 1 to increment the range: Now your result is in between 1-7
Simple and efficient:
int rand7 ( void )
{
return 4; // this number has been calculated using
// rand5() and is in the range 1..7
}
(Inspired by http://stackoverflow.com/questions/84556/whats-your-favorite-programmer-cartoon/84747#84747).
I don't like ranges starting from 1, so I'll start from 0 :-)
unsigned rand5()
{
return rand() % 5;
}
unsigned rand7()
{
int r;
do
{
r = rand5();
r = r * 5 + rand5();
r = r * 5 + rand5();
r = r * 5 + rand5();
r = r * 5 + rand5();
r = r * 5 + rand5();
} while (r > 15623);
return r / 2232;
}
function Rand7
put 200 into x
repeat while x > 118
put ((random(5)-1) * 25) + ((random(5)-1) * 5) + (random(5)-1) into x
end repeat
return (x mod 7) + 1
end Rand7
Three calls to Rand5, which only repeats 6 times out of 125, on average.
Think of it as a 3D array, 5x5x5, filled with 1 to 7 over and over, and 6 blanks. Re-roll on the blanks. The rand5 calls create a three digit base-5 index into that array.
There would be fewer repeats with a 4D, or higher N-dimensional arrays, but this means more calls to the rand5 function become standard. You'll start to get diminishing efficiency returns at higher dimensions. Three seems to me to be a good compromise, but I haven't tested them against each other to be sure. And it would be rand5-implementation specific.