ansaurus

Question

Given a function which produces a random integer in the range 1 to 5, write a function which produces a random integer in the range 1 to 7

Answer 1

A:

int rand7( void )
{
    int i;
    do i=rand5()+rand5()-1; while( i > 7 );
    return( i );
}

EDIT: Wrong, doesn't produce a uniform distribution.

EDIT2: Double-wrong, won't produce 1 (now fixed)

2008-09-26 04:37:34

I don't believe this will give a uniform distribution

Brent.Longborough 2008-09-26 04:42:33

This does not produce a uniform distribution.

Adam Rosenfield 2008-09-26 04:42:49

Will not generate 1. Ever!!

paxdiablo 2008-09-26 04:45:52

Just make it rand5 + rand5 - 1

paxdiablo 2008-09-26 04:51:49

Specification didnt call for an even distribution.

paxdiablo 2008-09-26 05:07:03

Um, I think "even distribution" is assumed. Otherwise you could just do "return 1;"

Beska 2009-04-30 17:03:47

Answer 2

+2 A:

Are homework problems allowed here?

This function does crude "base 5" math to generate a number between 0 and 6.

function rnd7() {
    do {
        r1 = rnd5() - 1;
        do {
            r2=rnd5() - 1;
        } while (r2 > 1);
        result = r2 * 5 + r1;
    } while (result > 6);
    return result + 1;
}

Will Hartung 2008-09-26 04:41:39

They are allowed.

mafutrct 2009-05-04 05:50:43

A correct solution (which puts you way ahead of the curve), although not very efficient. This makes an average of 5 calls to rnd5() for each call to rnd7().

Adam Rosenfield 2009-05-08 03:24:51

Answer 3

+120 A:

There is no (exactly correct) solution which will run in a constant amount of time, since 1/7 is an infinite decimal in base 5. One simple solution would be to use rejection sampling, e.g.:


int i;
do
{
  i = 5 * (rand5() - 1) + rand5();  // i is now uniformly random between 1 and 25
} while(i > 21);
// i is now uniformly random between 1 and 21
return i % 7 + 1;  // result is now uniformly random between 1 and 7

This has an expected runtime of 25/21 = 1.19 iterations of the loop, but there is an infinitesimally small probability of looping forever.

Adam Rosenfield 2008-09-26 04:41:59

the -1 is not needed if the >21 is flipped to >26 b/c it doesn't matter where i's lower bound maps to,

BCS 2009-01-15 18:01:12

Uh, can anyone explain how this really works and produces a uniform distribution? The Wikipedia page on rejection sampling was not of much help.

sundar 2009-05-04 05:50:44

I add an answer below that runs faster than this one by about 7%.

Eyal 2009-05-04 09:29:08

My take on explaining why this is correct: Say that I want to write a program that outputs a stream of uniform random numbers from1 to 25; for that I'd just return 5 * (rand5() - 1) + rand5() as in the code in the answer. Now, if I want to build a stream of uniform random numbers between 1 and 21, if I just use the first stream but filter it so that numbers in [22, 25] are rejected, I can build that stream too. Next, if I take this stream and filter it so that for each element x I output x % 7 + 1, I have a stream of uniform random numbers from 1 to 7! Quite simple, isn't it? :D

Paggas 2009-05-05 06:14:04

@Eyal: +1, Great solution! While a 7% speed boost is not normally noteworthy, your solution has the property that, unlike mine, it never discards any randomness. My solution technically loses information in the case that i > 21. I'm afraid this comment box is too narrow to contain a detailed explanation. See also problem 2(a) (and its solution) of homework 1 at http://is.gd/x4gQ .

Adam Rosenfield 2009-05-06 02:06:03

@Nixuz: I've said it before and I'll say it again: there is NOT a guaranteed worst case constant time solution that generates a perfectly uniform distribution, because 5^n is not divisible by 7 for any positive integer n. Pax's solution is _very close_ to uniform, but it is not _perfectly_ uniform.

Adam Rosenfield 2009-05-08 03:14:51

I tested your algorithm in my harness and it does do marginally better at distribution (about a 2% improvement over 20 runs of 700000 samples). Your comments that my solution is a problem because there's no way to get from 0 to 6 in sequence is rubbish. ALL linear congruential methods have that problem and they seem to work fine. What matters is the distribution over a large enough set.If you're talking about true randomness (not something like linear congruential), that makes your solution provably incorrect since a sequence of 5,5,... ad infinitum is as equally likely as any other :-)

paxdiablo 2009-05-08 03:57:35

@Pax: How is my comment about no way to get from 0 to 6 rubbish? There _is_ no way to get form 0 to 6 with yours, regardless of the source of the underlying rand5() function. The problem assumes a true RNG for rand5(), not a pseudorandom linear congruential generator. It's true the linear congruential generators have correlation problems (especially if you look at more than 2 consecutive outputs), but those are not the subject of this problem. (continuing...)

Adam Rosenfield 2009-05-08 04:07:51

(continued...) If rand5() produces truly random outputs, with all sequences of a given length equally likely, then my rand7() will also be truly random, with all sequences of a given length equally likely. Yours will not.

Adam Rosenfield 2009-05-08 04:08:22

If you're talking about a true RNG, then your solution is provably incorrect. Any input (sequence of numbers) that consistently stays above 21 in your calculation renders your code into an infinite loop. I would rather have a provably correct (in terms of execution time) algorithm with slightly worse distribution properties, especially as the question called, in part, best CPU speed.

paxdiablo 2009-05-08 04:17:32

It boils down to what you want. You can either have a constant-time algorithm with near-perfect distribution or a variable-time algorithm with nearer-to-perfect distribution. I'm not confident that yours is perfect distribution either since it's effectively throwing away values from the rand5() distribution. It appears to be better based on sampling but that's not mathematical prrof.

paxdiablo 2009-05-08 04:20:24

Yes, any input that consistently stays above 21 will cause an infinite loop, but such a sequence has 0 probability of occurring with a true RNG -- the limit of (4/25)^n as n goes to infinity is 0. But with a pseudo-RNG, there is a real danger of infinite looping. Of course, this could easily be solved by, say, looping at most a fixed number of times, which would then result in a non-uniform distribution. The degree of non-uniformity would depend on the maximum number of iterations.

Adam Rosenfield 2009-05-08 04:25:31

And you're correct that it boils down to whether you want a perfect distribution with unbounded worst case runtime, or an imperfect distribution with a bounded runtime. This is a consequence of the fact that all powers 5 not divisible by 7, or equivalently if you have 5^n equally probably sequences of length n, there is no way to assign to each sequence a number from 1 to 7 such that each of 1..7 is equally probably.

Adam Rosenfield 2009-05-08 04:27:21

The proof that my algorithm is correct is contained in the comments in the code. After assigning i in the loop, all values from 1 to 25 are equally probable. If the loop terminates, then all values from 1 to 21 are equally probable, since for any k in [1, 21], P(i=k given loop terminated) = P(i=k)/P(loop terminated) (by definition of conditional probability) = (1/25)/(21/25) = 1/21 for all k. Thus, after performing the final modulus, all 7 values are equally likely. Consecutive samples are clearly independent of each other, since there is no saved state between calls.

Adam Rosenfield 2009-05-08 04:32:51

Err, that should be P(i=k given loop terminated) = P(i=k and loop terminated)/P(loop terminated) = (1/25)/(21/25) = 1/21 for 1 <= k <= 21, and it is 0 for 22 <= k <= 25, since P(i=k and loop terminated) is 0 for such k.

Adam Rosenfield 2009-05-08 04:37:25

We are going have to agree to disagree on this one Adam. A true RNG does *not* have zero probability of generating billions of 5s in a row (or even an infinite number, for that matter), that's as likely as any other sequence. A pseudo RNG *does* have zero probability of doing that by virtue of the fact that it is deterministic. And I still don't believe you have a perfect distribution with exclusion since you're throwing away information from the sequence that does have perfect distribution. I've passed it onto some mathematicians at the state uni here to see what they say. Stay tuned.

paxdiablo 2009-05-08 07:54:28

A true RNG does not have zero probability of generating a billion 5's in a row (that has probability 1/5^(10^9)). The probability of generating an infinite sequence of 5's is 0. I may be throwing away information (I'm using an average of 2.38 calls to rand5() per call to rand7(), whereas an optimal algorithm would use an average of log(7)/log(5) = 1.209 calls on average), but I still produce a uniform distribution.

Adam Rosenfield 2009-05-08 14:35:22

I couldn't resist adding an attempt - if you had the time to critique it that would be appreciated

ShuggyCoUk 2009-05-13 17:40:43

I have to admit that this answer has the best accuracy, in 1 million tests there was a variance of no more than 500 between the numbers; other methods gave a variance as large as 100k.

The Wicked Flea 2009-06-01 14:12:45

Why all the trouble? Won't this work?? int i; do { i = rand5() + rand5(); } while(i > 7); return i; Whats wrong with this??

Lazer 2009-09-18 11:10:42

@eSKay: No, that won't work, it produces a highly non-uniform distribution. It doesn't produce 1 at all, and it produces 2-7 with probabilities 1/15, 2/15, 3/15, 4/15, 5/15, and 6/15 respectively.

Adam Rosenfield 2009-09-18 15:37:33

`But there is an infinitesimally small probability of looping forever.` ;-/

Petr Peller 2010-05-12 08:55:21

"// i is now uniformly random between 1 and 21" should be "// i is now ALMOST uniformly random between 1 and 21"

understack 2010-08-18 11:22:07

Multiply by 4 instead of 5. Then numbers should be uniform between 1 and 21. Then while condition would not be required.

understack 2010-08-18 11:34:08

@understack: No. If you multiply by 4 instead of 5, it is no longer uniform. There are then 2 ways to get 5 (4*0+5 = 4*1+1) but only 1 way to get 1 (4*0+1).

Adam Rosenfield 2010-08-20 16:59:04

@Adam Rosenfield: that's true but I guess if you put condition like while (i>21), its not 'uniform' as well. technically, filtering out these cases means uniformity is disturbed.

understack 2010-08-21 19:19:32

@understack: No, it _is_ uniform. Once the `while(i > 21)` loop terminates, all values between 1 and 21 are equally likely. Read up on rejection sampling http://en.wikipedia.org/wiki/Rejection_sampling .

Adam Rosenfield 2010-08-21 20:27:36

@Adam Rosenfield:You're right. Thanks for the explanation. :)

understack 2010-08-22 02:11:58

Answer 4

+3 A:

int ans = 0;
while (ans == 0) 
{
     for (int i=0; i<3; i++) 
     {
          while ((r = rand5()) == 3){};
          ans += (r < 3) >> i
     }
}

Nescio 2008-09-26 04:48:00

A correct solution, making an average of 30/7 = 4.29 calls to rand5() per call to rand7().

Adam Rosenfield 2009-05-08 04:12:50

Answer 5

+83 A:

// Return a random value between 1 and 7
// In case of complaints of non random results just
// sigh, roll your eyes and point out if it returned
// the values the user expected it wouldn't be random
// would it?
int random1_to_7()
{
  return random1_to_5();  
}

DrStalker 2008-09-26 04:51:16

<G> cute..

BCS 2008-09-26 04:52:19

Haha. http://xkcd.com/221/

James Baker 2008-10-04 18:44:51

Brilliant. I love it.

Prestaul 2009-01-15 03:44:38

Mandatory dilbert strip: http://tinyurl.com/3aav3f

agnul 2009-04-30 15:45:02

Not very helpful, and too obvious.

mafutrct 2009-05-04 05:48:30

Brilliant solution, I love it.

wasatz 2010-08-24 07:26:58

It's perfectly correct - the result is random, and the result is in the specified range. The question did *not* specify that the results were to be evenly distributed over the range.

caf 2010-10-05 23:57:12

Answer 6

+9 A:

rand7() = (rand5()+rand5()+rand5()+rand5()+rand5()+rand5()+rand5())%7+1

Edit: That doesn't quite work. It's off by about 2 parts in 1000. The buckets get:

value   Count  Error%
1       11158  -0.0035
2       11144  -0.0214
3       11144  -0.0214
4       11158  -0.0035
5       11172  +0.0144
6       11177  +0.0208
7       11172  +0.0144

By switching to a sum of

n   Error%
10  +/- 1e-3,
12  +/- 1e-4,
14  +/- 1e-5,
16  +/- 1e-6,
...
28  +/- 3e-11

seems to gain an order of magnitude for every 2 added

BCS 2008-09-26 04:51:38

+1 vote. but i think no need for -7

Michael Buen 2009-01-15 05:02:18

Darn, your right. An't %math fun?

BCS 2009-01-15 17:54:05

This is not a uniform distribution. It's _very close_ to uniform, but not perfectly uniform.

Adam Rosenfield 2009-01-15 18:06:10

Not a uniform distribution.

Jason S 2009-01-24 20:07:31

Ah! Dice and 7's. If you are going to say I'm wrong, you shouldn't leave the proof as an exercise for the reader.

BCS 2009-01-25 00:22:45

The proof that it's not uniform is simple: there are 5^7 possible ways the randomness can go, and as 5^7 is not a multiple of 7, it's not possible that all 7 sums are equally likely. (Basically, it boils down to 7 being relatively prime to 5, or equivalently 1/7 not being a terminating decimal in base 5.) In fact it's not even the "most uniform" possible under this constraint: direct computation shows that of the 5^7=78125 sums, the number of times you get values 1 to 7 is {1: 11145, 2: 11120, 3: 11120, 4: 11145, 5: 11190, 6: 11215, 7: 11190}.

ShreevatsaR 2009-04-30 16:05:35

Answer 7

+6 A:

int randbit( void )
{
    while( 1 )
    {
        int r = rand5();
        if( r <= 4 ) return(r & 1);
    }
}

int randint( int nbits )
{
    int result = 0;
    while( nbits-- )
    {
        result = (result<<1) | randbit();
    }
    return( result );
}

int rand7( void )
{
    while( 1 )
    {
        int r = randint( 3 ) + 1;
        if( r <= 7 ) return( r );
    }
}

2008-09-26 05:03:20

A correct solution, making an average of 30/7 = 4.29 calls to rand5() per call to rand7().

Adam Rosenfield 2009-05-08 03:30:53

Answer 8

A:

solution in php

<?php
function random_5(){
    return rand(1,5);
}


function random_7(){
 $total = 0;

    for($i=0;$i<7;$i++){
     $total += random_5();
    }

    return ($total%7)+1; 
}

echo random_7();
?>

2008-11-27 20:21:43

Not a uniform distribution.

Adam Rosenfield 2009-04-30 21:07:04

Answer 9

+4 A:

The following produces a uniform distribution on {1, 2, 3, 4, 5, 6, 7} using a random number generator producing a uniform distribution on {1, 2, 3, 4, 5}. The code is messy, but the logic is clear.

public static int random_7(Random rg) {
    int returnValue = 0;
    while (returnValue == 0) {
        for (int i = 1; i <= 3; i++) {
            returnValue = (returnValue << 1) + SimulateFairCoin(rg);
        }
    }
    return returnValue;
}

private static int SimulateFairCoin(Random rg) {
    while (true) {
        int flipOne = random_5_mod_2(rg);
        int flipTwo = random_5_mod_2(rg);

        if (flipOne == 0 && flipTwo == 1) {
            return 0;
        }
        else if (flipOne == 1 && flipTwo == 0) {
            return 1;
        }
    }
}

private static int random_5_mod_2(Random rg) {
    return random_5(rg) % 2;
}

private static int random_5(Random rg) {
    return rg.Next(5) + 1;
}

Jason 2009-01-15 04:18:34

A correct solution (which puts you way ahead of the curve), although not very efficient. This makes an average of 25/6 = 4.17 calls to random_5_mod_2 per fair coin flip, for a total average of 100/7 = 14.3 calls to random_5() per call to random_7().

Adam Rosenfield 2009-05-08 03:28:36

Answer 10

A:

For number 1, can someone explain what's wrong with this?

public class Sandbox {

private Random random = new Random();

public static void main(String[] args) {
    Sandbox sb = new Sandbox();
    sb.go();
}

private void go() {
    int [] places = new int[5];
    for (int i = 0; i < 10000000; i++) {
        int result = rand5();
        places[result] = places[result] + 1;
    }

    for (int i = 0; i < places.length; i++) {
        int place = places[i];
        System.out.println("#" + i + " = " + place);
    }
}

public int rand7() {
    return random.nextInt(7);
}

public int rand5() {
    int r = rand7();
    switch (r) {
        case 0:
        case 1:
        case 2:
        case 3:
        case 4:
            return r;
        case 5:
        case 6:
            return rand5();
        default:
            throw new IllegalStateException(r + "");
    }
}

}

tieTYT 2009-04-29 21:52:57

You've got the problem the wrong way up. We have a rand5() defined, but not rand7(). Unfortunately your exact solution won't work for that case, but instead needs something like Adam Rosenfield's.

Ant 2009-04-30 08:16:11

Answer 11

+1 A:

Assuming that rand(n) here means "random integer in a uniform distribution from 0 to n-1", here's a code sample using Python's randint, which has that effect. It uses only randint(5), and constants, to produce the effect of randint(7). A little silly, actually

from random import randint
sum = 7
while sum >= 7:
    first = randint(0,5)   
    toadd = 9999
    while toadd>1:
     toadd = randint(0,5)
    if toadd:
     sum = first+5
    else:
     sum = first

assert 7>sum>=0 
print sum

Joshua Fox 2009-04-30 09:39:47

Answer 12

+16 A:

(I have stolen Adam Rosenfeld's answer and made it run about 7% faster.)

Assume that rand5() returns one of {0,1,2,3,4} with equal distribution and the goal is return {0,1,2,3,4,5,6} with equal distribution.

int rand7() {
  i = 5 * rand5() + rand5();
  max = 25;
  //i is uniform among {0 ... max-1}
  while(i < max%7) {
    //i is uniform among {0 ... (max%7 - 1)}
    i *= 5;
    i += rand5(); //i is uniform {0 ... (((max%7)*5) - 1)}
    max %= 7;
    max *= 5; //once again, i is uniform among {0 ... max-1}
  }
  return(i%7);
}

We're keeping track of the largest value that the loop can make in the variable max. If the reult so far is between max%7 and max-1 then the result will be uniformly distrubuted in that range. If not, we use the remainder, which is random between 0 and max%7-1, and another call to rand() to make a new number and a new max. Then we start again.

Edit: Expect number of times to call rand5() is x in this equation:

x =  2     * 21/25
   + 3     *  4/25 * 14/20
   + 4     *  4/25 *  6/20 * 28/30
   + 5     *  4/25 *  6/20 *  2/30 * 7/10
   + 6     *  4/25 *  6/20 *  2/30 * 3/10 * 14/15
   + (6+x) *  4/25 *  6/20 *  2/30 * 3/10 *  1/15
x = about 2.21 calls to rand5()

Eyal 2009-04-30 13:56:02

Results cataloged in 1,000,000 tries: 1=47216; 2=127444; 3=141407; 4=221453; 5=127479; 6=167536; 7=167465. As you can see, distribution is lacking in respect to the odds of getting a 1.

The Wicked Flea 2009-06-01 14:02:50

@The Wicked Flea: I think you're mistaken. Are you sure that the input rand5() you were using for your test produced 0-4 instead of 1-5, as specified in this solution?

Adam Rosenfield 2009-06-10 00:38:11

Answer 13

A:

A constant time solution that produces approximately uniform distribution. ~~The trick is 625 happens to be cleanly divisible by 7 and you can get uniform distributions as you build up to that range.~~

Edit: My bad, I miscalculated, but instead of pulling it I'll leave it in case someone finds it useful/entertaining. It does actually work after all... :)

int rand5()
{
    return (rand() % 5) + 1;
}

int rand25()
{ 
    return (5 * (rand5() - 1) + rand5());
}

int rand625()
{
    return (25 * (rand25() - 1) + rand25());
}

int rand7()
{
    return ((625 * (rand625() - 1) + rand625()) - 1) % 7 + 1;
}

pbhogan 2009-04-30 15:14:08

"625 happens to be cleanly divisible by 7" - guess again. 625 = 5^4 is not divisible by 7.

Steve Jessop 2009-04-30 16:32:33

Thanks, you are quite correct. Apple's calculator lied to me (or rather I forgot it doesn't have decimals in "programmer" mode).

pbhogan 2009-04-30 21:16:15

Answer 14

A:

int rand7()
{
    return (rand5()+rand5()+rand5()-3)/2+1;
}

HugoAbreu 2009-04-30 17:17:36

Not a uniform distribution.

Adam Rosenfield 2009-04-30 21:04:53

Answer 15

A:

int rand7()
{
    int zero_one_or_two = ( rand5() + rand5() - 1 ) % 3 ;
    return rand5() + zero_one_or_two ;
}

mangokun 2009-04-30 17:43:03

Not a uniform distribution.

Adam Rosenfield 2009-04-30 21:05:01

Answer 16

A:

I feel stupid in front of all this complicated answsers.

Why can't it be :

int random1_to_7()
{
  return (random1_to_5() * 7) / 5;  
}

?

e-satis 2009-04-30 17:46:33

Test this - it doesn't work. It won't provide an even distribution across all 7 numbers.

Jon Tackabury 2009-04-30 18:25:15

This would work if we were interested in real numbers, but since we're dealing with ints, that code will only produce 1, 2, 4, 5, or 7, and never 3 or 6.

ESRogs 2009-04-30 19:00:52

OK thks. Random is always a tricky subject, isn't it ?

e-satis 2009-04-30 19:08:54

Answer 17

A:

#!/usr/bin/env ruby
class Integer
  def rand7
    rand(6)+1
  end
end

def rand5
  rand(4)+1
end

x = rand5() # x => int between 1 and 5

y = x.rand7() # y => int between 1 and 7

..although that may possibly be considered cheating..

dbr 2009-04-30 17:59:51

Answer 18

A:

I have played around and I write "testing environment" for this Rand(7) algorithm. For example if you want to try what distribution gives your algorithm or how much iterations takes to generate all distinct random values (for Rand(7) 1-7), you can use it.

My core algorithm is this:

return (Rand5() + Rand5()) % 7 + 1;

Well is no less uniformly distributed then Adam Rosenfield's one. (which I included in my snippet code)

private static int Rand7WithRand5()
{
    //PUT YOU FAVOURITE ALGORITHM HERE//

    //1. Stackoverflow winner
    int i;
    do
    {
        i = 5 * (Rand5() - 1) + Rand5(); // i is now uniformly random between 1 and 25
    } while (i > 21);
    // i is now uniformly random between 1 and 21
    return i % 7 + 1;

    //My 2 cents
    //return (Rand5() + Rand5()) % 7 + 1;
}

This "testing environment" can take any Rand(n) algorithm and test and evaluate it (distribution and speed). Just put your code into the "Rand7WithRand5" method and run the snippet.

Few observations:

Adam Rosenfield's algorithm is no better distributed then, for example, mine. Anyway, both algorithms distribution is horrible.
Native Rand7 (random.Next(1, 8)) is completed as it generated all members in given interval in around 200+ iterations, Rand7WithRand5 algorithms take order of 10k (around 30-70k)
Real challenge is not to write a method to generate Rand(7) from Rand(5), but it generate values more or less uniformly distributed.

Peter Stegnar 2009-05-03 10:22:05

No, your algorithm does not product a uniform distribution. It produces 1..7 with probabilities 4/25, 3/25, 3/25, 3/25, 3/25, 4/25, 5/25, as can easily be verified by counting all 25 possible outcomes. 25 is not divisible by 7. Your test for uniformity is also flawed -- the number of trials needed to get every number has a complicated distribution, see http://is.gd/wntB . You need to perform your test thousands of times, not once. A better test would be to call the RNG thousands of times and compare the number of occurrences of each outcome.

Adam Rosenfield 2009-05-03 15:53:23

Answer 19

A:

Here is a working Python implementation of Adam's answer.

import random

def rand5():
    return random.randint(1, 5)

def rand7():
    while True:
     r = 5 * (rand5() - 1) + rand5()
     #r is now uniformly random between 1 and 25
     if (r <= 21):
      break
    #result is now uniformly random between 1 and 7
    return r % 7 + 1

I like to throw algorithms I'm looking at into Python so I can play around with them, thought I'd post it here in the hopes that it is useful to someone out there, not that it took long to throw together.

James McMahon 2009-05-04 04:57:24

No, that is quite dissimilar from my answer. You're looping 21 times and discarding the first 20 iterations' results. You're also using a rand4() and a rand5() as input, which quite obviously breaks the rules of using only rand5(). Finally, you produce a non-uniform distribution.

Adam Rosenfield 2009-05-05 13:28:31

Sorry about that. I was pretty tired when I looked this question over, tired enough that I completely misread your algorithm. I actually threw it into Python because I couldn't understand why you were looping 21 times. Makes a lot more sense now. I did the random.randint(1, 4) thing as a shorthand but I guess you are correct, it is against the spirit of the question. I've corrected the code.

James McMahon 2009-05-06 00:12:42

Why the vote down for this? Please post if you find an error in the code.

James McMahon 2009-12-03 12:39:48

Answer 20

A:

By using a rolling total, you can both

maintain an equal distribution; and
not have to sacrifice any element in the random sequence.

Both these problems are an issue with the simplistic rand(5)+rand(5)...-type solutions. The following Python code shows how to implement it (most of this is proving the distribution).

import random
x = []
for i in range (0,7):
    x.append (0)
t = 0
tt = 0
for i in range (0,700000):
    ########################################
    #####            qq.py             #####
    r = int (random.random () * 5)
    t = (t + r) % 7
    ########################################
    #####       qq_notsogood.py        #####
    #r = 20
    #while r > 6:
        #r =     int (random.random () * 5)
        #r = r + int (random.random () * 5)
    #t = r
    ########################################
    x[t] = x[t] + 1
    tt = tt + 1
high = x[0]
low = x[0]
for i in range (0,7):
    print "%d: %7d %.5f" % (i, x[i], 100.0 * x[i] / tt)
    if x[i] < low:
        low = x[i]
    if x[i] > high:
        high = x[i]
diff = high - low
print "Variation = %d (%.5f%%)" % (diff, 100.0 * diff / tt)

And this output shows the results:

pax$ python qq.py
0:   99908 14.27257
1:  100029 14.28986
2:  100327 14.33243
3:  100395 14.34214
4:   99104 14.15771
5:   99829 14.26129
6:  100408 14.34400
Variation = 1304 (0.18629%)

pax$ python qq.py
0:   99547 14.22100
1:  100229 14.31843
2:  100078 14.29686
3:   99451 14.20729
4:  100284 14.32629
5:  100038 14.29114
6:  100373 14.33900
Variation = 922 (0.13171%)

pax$ python qq.py
0:  100481 14.35443
1:   99188 14.16971
2:  100284 14.32629
3:  100222 14.31743
4:   99960 14.28000
5:   99426 14.20371
6:  100439 14.34843
Variation = 1293 (0.18471%)

A simplistic rand(5)+rand(5), ignoring those cases where this returns more than 6 has a typical variation of 18%, 100 times that of the method shown above:

pax$ python qq_notsogood.py
0:   31756 4.53657
1:   63304 9.04343
2:   95507 13.64386
3:  127825 18.26071
4:  158851 22.69300
5:  127567 18.22386
6:   95190 13.59857
Variation = 127095 (18.15643%)

pax$ python qq_notsogood.py
0:   31792 4.54171
1:   63637 9.09100
2:   95641 13.66300
3:  127627 18.23243
4:  158751 22.67871
5:  126782 18.11171
6:   95770 13.68143
Variation = 126959 (18.13700%)

pax$ python qq_notsogood.py
0:   31955 4.56500
1:   63485 9.06929
2:   94849 13.54986
3:  127737 18.24814
4:  159687 22.81243
5:  127391 18.19871
6:   94896 13.55657
Variation = 127732 (18.24743%)

And, on the advice of Nixuz, I've cleaned the script up so you can just extract and use the rand7... stuff:

import random

# rand5() returns 0 through 4 inclusive.

def rand5():
    return int (random.random () * 5)

# rand7() generator returns 0 through 6 inclusive (using rand5()).

def rand7():
    rand7ret = 0
    while True:
        rand7ret = (rand7ret + rand5()) % 7
        yield rand7ret

# Number of test runs.

count = 700000

# Work out distribution.

distrib = [0,0,0,0,0,0,0]
rgen =rand7()
for i in range (0,count):
    r = rgen.next()
    distrib[r] = distrib[r] + 1

# Print distributions and calculate variation.

high = distrib[0]
low = distrib[0]
for i in range (0,7):
    print "%d: %7d %.5f" % (i, distrib[i], 100.0 * distrib[i] / count)
    if distrib[i] < low:
        low = distrib[i]
    if distrib[i] > high:
        high = distrib[i]
diff = high - low
print "Variation = %d (%.5f%%)" % (diff, 100.0 * diff / count)

paxdiablo 2009-05-07 03:43:40

Your definition of variance is completely different from the standard statistical definition of variance.

Adam Rosenfield 2009-05-08 01:17:50

Variance has many meanings of which the statistical is one, I was obviously using another (which may or may not be in an English dictionary :-) - I was just looking for a word that could be used for the percentage difference between the highest and lowest occurrence (variance, variation, variability), the point being to show the relative distribution-ness of the different methods. [And, yes, I know distribution-ness is almost certainly not a real word :-) ]. Anyway, I'll change it to keep you happy.

paxdiablo 2009-05-08 01:50:41

Your script is a total mess, but this answer works.

Nixuz 2009-05-08 02:37:11

I have posted a cleaner implementation of this method here: http://rafb.net/p/AQXiVL18.html

Nixuz 2009-05-08 03:03:36

Again, like many other solutions, this does not achieve a perfectly uniform distribution. After the first several numbers, it approximates a uniform distribution to many orders of magnitude. Th bigger problem, though, is correlation between consecutive numbers produced. If x and y are two consecutive numbers in the RNG sequence, all 49 possible pairs are not even close to being equally probably -- only 35 of the 49 pairs are achievable.

Adam Rosenfield 2009-05-08 03:18:22

Err, let me rephrase that. Given that a particular x was produced at some point in the sequence, only 5 of 7 numbers can be produced for the next number in the sequence. A true RNG would have all samples be independent of one another, but in this case they are clearly not.

Adam Rosenfield 2009-05-08 03:20:03

And that is true of any linear congruential RNG as well, Adam, given the formula they use. Short of having a "Schrodinger's Cat" device in your PC or basing it on some other truly random thing, you're not going to get any better.

paxdiablo 2009-05-08 03:29:17

@Pax: see my comments in my answer. We're not talking about linear congruential RNGs here.

Adam Rosenfield 2009-05-08 04:09:56

@Adam, that was an assumption *you* introduced in the comments to your answer. It was not in the original question and I would be more likely to believe that rand5() is a deterministic RNG rather than truly random.

paxdiablo 2009-05-08 07:56:50

It's true that the original question doesn't specify if the input and output functions produce independent and identically-distributed (iid) samples, but I think it's a reasonable expectation that if the input rand5() is iid, then the output rand7() should also be iid. If you don't think that's reasonable, have fun using your non-iid RNG.

Adam Rosenfield 2009-05-08 19:54:36

So, what's the word from the mathematicians at the university?

Adam Rosenfield 2009-05-12 02:52:41

Still waiting, I only see them once a fortnight for our Sunday bike ride and I'm taking number 1 son to the other side of the country tomorrow so it'll be at least another two weeks.

paxdiablo 2009-05-12 02:59:05

This solution is clearly broken. It's obvious that you need to be calling rand5 (on average) more than once per call to rand7 and this solution doesn't. Therefore the results cannot be random by any sane definition of random.

Chris Suter 2009-09-09 04:11:26

@Chris, the only sane definition of random is "random", and that means you cannot say *anything* about the distribution except over an infinite sample space. If you have a convincing proof that it's broken other than your vague feelings, let's see it. Pseudo-random numbers are only required to have a roughly equal distribution with a large repeat cycle. This solution has both of those features.

paxdiablo 2009-09-09 04:59:39

@Pax At every iteration of your function, it can only return one of five different values (albeit in the range 0-6). The very first iteration can only return a number in the range 0-4. So, it should be clear that whilst your function may have uniform distribution, the samples are not independent i.e. they're correlated which isn't something you want in a random number generator.

Chris Suter 2009-09-09 05:57:56

If you understand how linear congruential algorithms work, Chris, you'll know that determinism is a feature of them all. The next value is *wholly* dependent on the one that came before. The trick is to make it seem random by careful selection of the algorithm. If you wanted a truly random sequence, you'd be using quantum effects or a more random seed such as time between user keypresses.

paxdiablo 2009-09-09 06:51:13

@Pax Ignoring the fact that your algorithm isn't actually a conventional LCG algorithm, LCG algorithms are considered low quality pseudorandom number generators. We're not told what the quality of rand5() is, but we must assume that it's high quality and that you'd want the output stream to be no worse. That's not the case with your algorithm.

Chris Suter 2009-09-09 10:12:44

Answer 21

+94 A:

This is equivalent to Adam Rosenfield's solution, but may be a bit more clear for some readers. It assumes rand5() is a function that returns a statistically random integer in the range 1 through 5 inclusive.

int rand7()
{
    int vals[5][5] = {
        { 1, 2, 3, 4, 5 },
        { 6, 7, 1, 2, 3 },
        { 4, 5, 6, 7, 1 },
        { 2, 3, 4, 5, 6 },
        { 7, 0, 0, 0, 0 }
    };

    int result = 0;
    while (result == 0)
    {
        int i = rand5();
        int j = rand5();
        result = vals[i-1][j-1];
    }
    return result;
}

How does it work? Think of it like this: imagine printing out this double-dimension array on paper, tacking it up to a dart board and randomly throwing darts at it. If you hit a non-zero value, it's a statistically random value between 1 and 7, since there are an equal number of non-zero values to choose from. If you hit a zero, just keep throwing the dart until you hit a non-zero. That's what this code is doing: the i and j indexes randomly select a location on the dart board, and if we don't get a good result, we keep throwing darts.

Like Adam said, this can run forever in the worst case, but statistically the worst case never happens. :)

Rob McAfee 2009-05-08 22:59:54

Nice... this is a much easier way to visualize what's happening in other solutions.

greg7gkb 2009-06-11 06:02:08

Great visualization!

Dinah 2009-06-23 13:40:23

good for visualisation, but not particularly scalable...

David_001 2010-02-27 09:00:14

Answer 22

+3 A:

If we consider the additional constraint of trying to give the most efficient answer i.e one that given an input stream, I, of uniformly distributed integers of length m from 1-5 outputs a stream O, of uniformly distributed integers from 1-7 of the longest length relative to m, say L(m).

The simplest way to analyse this is to treat the streams I and O as 5-ary and 7-ary numbers respectively. This is achieved by the main answer's idea of taking the stream a1, a2, a3,... -> a1+5*a2+5^2*a3+.. and similarly for stream O.

Then if we take a section of the input stream of length m choose n s.t. 5^m-7^n=c where c>0 and is as small as possible. Then there is a uniform map from the input stream of length m to integers from 1 to 5^m and another uniform map from integers from 1 to 7^n to the output stream of length n where we may have to lose a few cases from the input stream when the mapped integer exceeds 7^n.

So this gives a value for L(m) of around m (log5/log7) which is approximately .82m.

The difficulty with the above analysis is the equation 5^m-7^n=c which is not easy to solve exactly and the case where the uniform value from 1 to 5^m exceeds 7^n and we lose efficiency.

The question is how close to the best possible value of m (log5/log7) can be attain. For example when this number approaches close to an integer can we find a way to achieve this exact integral number of output values?

If 5^m-7^n=c then from the input stream we effectively generate a uniform random number from 0 to (5^m)-1 and don't use any values higher than 7^n. However these values can be rescued and used again. They effectively generate a uniform sequence of numbers from 1 to 5^m-7^n. So we can then try to use these and convert them into 7-ary numbers so that we can create more output values.

If we let T7(X) to be the average length of the output sequence of random(1-7) integers derived from a uniform input of size X, and assuming that 5^m=7^n0+7^n1+7^n2+...+7^nr+s, s<7.

Then T7(5^m)=n0x7^n0/5^m + ((5^m-7^n0)/5^m) T7(5^m-7^n0) since we have a length no sequence with probability 7^n0/5^m with a residual of length 5^m-7^n0 with probability (5^m-7^n0)/5^m).

If we just keep substituting we obtain:

T7(5^m) = n0x7^n0/5^m + n1x7^n1/5^m + ... + nrx7^nr/5^m = (n0x7^n0 + n1x7^n1 + ... + nrx7^nr)/5^m

Hence L(m)=T7(5^m)=(n0x7^n0 + n1x7^n1 + ... + nrx7^nr)/(7^n0+7^n1+7^n2+...+7^nr+s)

Another way of putting this is:

If 5^m has 7-ary representation a0+a1*7 + a2*7^2 + a3*7^3+...+ar*7^r Then L(m) = (a1*7 + 2a2*7^2 + 3a3*7^3+...+rar*7^r)/(a0+a1*7 + a2*7^2 + a3*7^3+...+ar*7^r)

The best possible case is my original one above where 5^m=7^n+s, where s<7.

Then T7(5^m) = nx(7^n)/(7^n+s) = n+o(1) = m (Log5/Log7)+o(1) as before.

The worst case is when we can only find k and s.t 5^m = kx7+s.

Then T7(5^m) = 1x(k.7)/(k.7+s) = 1+o(1).

Other cases are somewhere inbetween. It would be interesting to see how well we can do for very large m, i.e. how good can we get the error term:

T7(5^m) = m (Log5/Log7)+e(m).

It seems impossible to achieve e(m) = o(1) in general but hopefully we can prove e(m)=o(m).

The whole thing then rests on the distribution of the 7-ary digits of 5^m for various values of m.

I'm sure there is a lot of theory out there that covers this I may have a look and report back at some point.

Ivan 2009-05-09 00:11:35

+2 (if I could)--this was the only good answer (as opposed to merely adequate). You've got the second best answer that will fit in 32 bit integers.

Rex Kerr 2010-03-10 19:39:37

Answer 23

A:

This answer is more an experiment in obtaining the most entropy possible from the Rand5 function. t is therefore somewhat unclear and almost certainly a lot slower than other implementations.

Assuming the uniform distribution from 0-4 and resulting uniform distribution from 0-6:

public class SevenFromFive
{
  public SevenFromFive()
  {
    // this outputs a uniform ditribution but for some reason including it 
    // screws up the output distribution
    // open question Why?
    this.fifth = new ProbabilityCondensor(5, b => {});
    this.eigth = new ProbabilityCondensor(8, AddEntropy);
  } 

  private static Random r = new Random();
  private static uint Rand5()
  {
    return (uint)r.Next(0,5);
  }

  private class ProbabilityCondensor
  {
    private readonly int samples;
    private int counter;
    private int store;
    private readonly Action<bool> output;

    public ProbabilityCondensor(int chanceOfTrueReciprocal,
      Action<bool> output)
    {
      this.output = output;
      this.samples = chanceOfTrueReciprocal - 1;  
    }

    public void Add(bool bit)
    {
      this.counter++;
      if (bit)
        this.store++;   
      if (counter == samples)
      {
        bool? e;
        if (store == 0)
          e = false;
        else if (store == 1)
          e = true;
        else
          e = null;// discard for now       
        counter = 0;
        store = 0;
        if (e.HasValue)
          output(e.Value);
      }
    }
  }

  ulong buffer = 0;
  const ulong Mask = 7UL;
  int bitsAvail = 0;
  private readonly ProbabilityCondensor fifth;
  private readonly ProbabilityCondensor eigth;

  private void AddEntropy(bool bit)
  {
    buffer <<= 1;
    if (bit)
      buffer |= 1;      
    bitsAvail++;
  }

  private void AddTwoBitsEntropy(uint u)
  {
    buffer <<= 2;
    buffer |= (u & 3UL);    
    bitsAvail += 2;
  }

  public uint Rand7()
  {
    uint selection;   
    do
    {
      while (bitsAvail < 3)
      {
        var x = Rand5();
        if (x < 4)
        {
          // put the two low order bits straight in
          AddTwoBitsEntropy(x);
          fifth.Add(false);
        }
        else
        { 
          fifth.Add(true);
        }
      }
      // read 3 bits
      selection = (uint)((buffer & Mask));
      bitsAvail -= 3;     
      buffer >>= 3;
      if (selection == 7)
        eigth.Add(true);
      else
        eigth.Add(false);
    }
    while (selection == 7);   
    return selection;
  }
}

The number of bits added to the buffer per call to Rand5 is currently 4/5 * 2 so 1.6. If the 1/5 probability value is included that increases by 0.05 so 1.65 but see the comment in the code where I have had to disable this.

Bits consumed by call to Rand7 = 3 + 1/8 * (3 + 1/8 * (3 + 1/8 * (...
This is 3 + 3/8 + 3/64 + 3/512 ... so approx 3.42

By extracting information from the sevens I reclaim 1/8*1/7 bits per call so about 0.018

This gives a net consumption 3.4 bits per call which means the ratio is 2.125 calls to Rand5 for every Rand7. The optimum should be 2.1.

I would imagine this approach is significantly slower than many of the other ones here unless the cost of the call to Rand5 is extremely expensive (say calling out to some external source of entropy).

ShuggyCoUk 2009-05-13 17:38:16

Your solution appears correct, aside from some simple errors: "if(count > 1)" should be "if(count <= 1)", and the "i++" that occurs shortly thereafter should be inside the curly braces that precede it. I'm not sure whether or not BitsSet() is correct, but that's somewhat irrelevant.

Adam Rosenfield 2009-05-13 18:51:43

Overall, though, your function is very difficult to understand. It does make a _slightly_ better use of entropy than it otherwise could, at the cost of more complication. There's also no reason to initially fill the buffer with 35 random bits on the first call, when 3 would suffice.

Adam Rosenfield 2009-05-13 18:56:35

I corrected the <= thanks, the i++ really should be there though. It should happen on the zero and the 1 case (adding a 1 or a zero respectively to the buffer). This is absolutely not what I would suggest using, it's horribly complicated. I was just interested i how close I could get to the theoretical entropy limits inherent in the problem... Thanks for the feedback.Ironically the filling of the buffer on the first call was to make it simpler to write :)

ShuggyCoUk 2009-05-13 20:35:10

I reworked this to be easier to understand (at the cost of speed) but also made it correct. It is not optimum yet, for some reason the 1/5 bits cause issues even though they are uniform in count.

ShuggyCoUk 2009-05-14 10:18:35

Answer 24

+46 A:

I'd like to add another answer, in addition to my first answer. This answer attempts to minimize the number of calls to rand5() per call to rand7(), to maximize the usage of randomness. That is, if you consider randomness to be a precious resource, we want to use as much of it as possible, without throwing away any random bits. This answer also has some similarities with the logic presented in Ivan's answer.

The entropy of a random variable is a well-defined quantity. For a random variable which takes on N states with equal probabilities (a uniform distribution), the entropy is log₂ N. Thus, rand5() has approximately 2.32193 bits of entropy, and rand7() has about 2.80735 bits of entropy. If we hope to maximize our use of randomness, we need to use all 2.32193 bits of entropy from each call to rand5(), and apply them to generating 2.80735 bits of entropy needed for each call to rand7(). The fundamental limit, then, is that we can do no better than log(7)/log(5) = 1.20906 calls to rand5() per call to rand7().

Side notes: all logarithms in this answer will be base 2 unless specified otherwise. rand5() will be assumed to return numbers in the range [0, 4], and rand7() will be assumed to return numbers in the range [0, 6]. Adjusting the ranges to [1, 5] and [1, 7] respectively is trivial.

So how do we do it? We generate an infinitely precise random real number between 0 and 1 (pretend for the moment that we could actually compute and store such an infinitely precise number -- we'll fix this later). We can generate such a number by generating its digits in base 5: we pick the random number 0.a₁a₂a₃..., where each digit a_i is chosen by a call to rand5(). For example, if our RNG chose a_i = 1 for all i, then ignoring the fact that that isn't very random, that would correspond to the real number 1/5 + 1/5² + 1/5³ + ... = 1/4 (sum of a geometric series).

Ok, so we've picked a random real number between 0 and 1. I now claim that such a random number is uniformly distributed. Intuitively, this is easy to understand, since each digit was picked uniformly, and the number is infinitely precise. However, a formal proof of this is somewhat more involved, since now we're dealing with a continuous distribution instead of a discrete distribution, so we need to prove that the probability that our number lies in an interval [a, b] equals the length of that interval, b - a. The proof is left as an exercise for the reader =).

Now that we have a random real number selected uniformly from the range [0, 1], we need to convert it to a series of uniformly random numbers in the range [0, 6] to generate the output of rand7(). How do we do this? Just the reverse of what we just did -- we convert it to an infinitely precise decimal in base 7, and then each base 7 digit will correspond to one output of rand7().

Taking the example from earlier, if our rand5() produces an infinite stream of 1's, then our random real number will be 1/4. Converting 1/4 to base 7, we get the infinite decimal 0.15151515..., so we will produce as output 1, 5, 1, 5, 1, 5, etc.

Ok, so we have the main idea, but we have two problems left: we can't actually compute or store an infinitely precise real number, so how do we deal with only a finite portion of it? Secondly, how do we actually convert it to base 7?

One way we can convert a number between 0 and 1 to base 7 is as follows:

Multiply by 7
The integral part of the result is the next base 7 digit
Subtract off the integral part, leaving only the fractional part
Goto step 1

To deal with the problem of infinite precision, we compute a partial result, and we also store an upper bound on what the result could be. That is, suppose we've called rand5() twice and it returned 1 both times. The number we've generated so far is 0.11 (base 5). Whatever the rest of the infinite series of calls to rand5() produce, the random real number we're generating will never be larger than 0.12: it is always true that 0.11 ≤ 0.11xyz... < 0.12.

So, keeping track of the current number so far, and the maximum value it could ever take, we convert both numbers to base 7. If they agree on the first k digits, then we can safely output the next k digits -- regardless of what the infinite stream of base 5 digits are, they will never affect the next k digits of the base 7 representation!

And that's the algorithm -- to generate the next output of rand7(), we generate only as many digits of rand5() as we need to ensure that we know with certainty the value of the next digit in the conversion of the random real number to base 7. Here is a Python implementation, with a test harness:

import random

rand5_calls = 0
def rand5():
    global rand5_calls
    rand5_calls += 1
    return random.randint(0, 4)

def rand7_gen():
    state = 0
    pow5 = 1
    pow7 = 7
    while True:
        if state / pow5 == (state + pow7) / pow5:
            result = state / pow5
            state = (state - result * pow5) * 7
            pow7 *= 7
            yield result
        else:
            state = 5 * state + pow7 * rand5()
            pow5 *= 5

if __name__ == '__main__':
    r7 = rand7_gen()
    N = 10000
    x = list(next(r7) for i in range(N))
    distr = [x.count(i) for i in range(7)]
    expmean = N / 7.0
    expstddev = math.sqrt(N * (1.0/7.0) * (6.0/7.0))

    print '%d TRIALS' % N
    print 'Expected mean: %.1f' % expmean
    print 'Expected standard deviation: %.1f' % expstddev
    print
    print 'DISTRIBUTION:'
    for i in range(7):
        print '%d: %d   (%+.3f stddevs)' % (i, distr[i], (distr[i] - expmean) / expstddev)
    print
    print 'Calls to rand5: %d (average of %f per call to rand7)' % (rand5_calls, float(rand5_calls) / N)

Note that rand7_gen() returns a generator, since it has internal state involving the conversion of the number to base 7. The test harness calls next(r7) 10000 times to produce 10000 random numbers, and then it measures their distribution. Only integer math is used, so the results are exactly correct.

Also note that the numbers here get very big, very fast. Powers of 5 and 7 grow quickly. Hence, performance will start to degrade noticeably after generating lots of random numbers, due to bignum arithmetic. But remember here, my goal was to maximize the usage of random bits, not to maximize performance (although that is a secondary goal).

In one run of this, I made 12091 calls to rand5() for 10000 calls to rand7(), achieving the minimum of log(7)/log(5) calls on average to 4 significant figures, and the resulting output was uniform.

In order to port this code to a language that doesn't have arbitrarily large integers built-in, you'll have to cap the values of pow5 and pow7 to the maximum value of your native integral type -- if they get too big, then reset everything and start over. This will increase the average number of calls to rand5() per call to rand7() very slightly, but hopefully it shouldn't increase too much even for 32- or 64-bit integers.

Adam Rosenfield 2009-05-21 03:38:42

+1 for a really interesting answer. Would it be possible, rather than resetting at a certain value, to simply shift off bits that have been used, and move the other bits up, and basically only keeping the bits that are going to be used? Or am I missing something?

Chris Lutz 2009-05-21 03:54:04

I'm not 100% sure, but I believe if you did that, you would skew the distribution ever so slightly (although I doubt that such skew would be measurable without trillions of trials).

Adam Rosenfield 2009-05-21 04:44:34

+1, I really like this answer

cube 2009-06-27 13:05:59

+1 Very interesting answer.

Nixuz 2009-07-01 00:59:36

FTW! I tried to make the bignums smaller but it can't be done because no power of 5 has factors in common with a power of 7! Also, good use of the yield keyword. Very well done.

Eyal 2009-09-02 07:05:57

+1, very interesting argument about the lower bound on the number of calls to rand5()

Krystian 2009-11-05 20:30:24

Answer 25

+1 A:

The premise behind Adam Rosenfield's correct answer is:

x = 5^n (in his case: n=2)
manipulate n rand5 calls to get a number y within range [1, x]
z = ((int)(x / 7)) * 7
if y > z, try again. else return y % 7 + 1

When n equals 2, you have 4 throw-away possibilities: y = {22, 23, 24, 25}. If you use n equals 6, you only have 1 throw-away: y = {15625}.

5^6 = 15625
7 * 2232 = 15624

You call rand5 more times. However, you have a much lower chance of getting a throw-away value (or an infinite loop). If there is a way to get no possible throw-away value for y, I haven't found it yet.

Dinah 2009-06-23 14:17:22

There is provably no case without throwaway values--if there was no throwaway, 5^n and 7^m would have a factor in common. But they're (powers of) primes, so they don't.

Rex Kerr 2010-03-10 19:28:21

@Rex Kerr: really good point and well said

Dinah 2010-03-10 20:21:12

Answer 26

+1 A:

Here's my answer:

static struct rand_buffer {
  unsigned v, count;
} buf2, buf3;

void push (struct rand_buffer *buf, unsigned n, unsigned v)
{
  buf->v = buf->v * n + v;
  ++buf->count;
}

#define PUSH(n, v)  push (&buf##n, n, v)

int rand16 (void)
{
  int v = buf2.v & 0xf;
  buf2.v >>= 4;
  buf2.count -= 4;
  return v;
}

int rand9 (void)
{
  int v = buf3.v % 9;
  buf3.v /= 9;
  buf3.count -= 2;
  return v;
}

int rand7 (void)
{
  if (buf3.count >= 2) {
    int v = rand9 ();

    if (v < 7)
      return v % 7 + 1;

    PUSH (2, v - 7);
  }

  for (;;) {
    if (buf2.count >= 4) {
      int v = rand16 ();

      if (v < 14) {
        PUSH (2, v / 7);
        return v % 7 + 1;
      }

      PUSH (2, v - 14);
    }

    // Get a number between 0 & 25
    int v = 5 * (rand5 () - 1) + rand5 () - 1;

    if (v < 21) {
      PUSH (3, v / 7);
      return v % 7 + 1;
    }

    v -= 21;
    PUSH (2, v & 1);
    PUSH (2, v >> 1);
  }
}

It's a little more complicated than others, but I believe it minimises the calls to rand5. As with other solutions, there's a small probability that it could loop for a long time.

Chris Suter 2009-09-09 03:44:47

This produces a distribution not much different from the other solutions but has the added disadvantage of being needlessly complex. It also suffers from the provably incorrect non-deterministic loop-forever possibility if the numbers are truly random. I still think the ones that produce a slightly less uniform distribution (though still far more than adequate) but guarantee deterministic behavior are better.

paxdiablo 2009-09-09 05:37:28

@Pax: Please enlighten me as to how this produces a non-uniform distribution. My analysis of the code, as well as my own testing, indicates that this produces a uniform distribution. As we've previously discussed, it's impossible to both produce a perfectly uniform distribution and have a guaranteed constant time upper bound of the running time.

Adam Rosenfield 2009-09-18 15:53:47

Answer 27

A:

There are elegant algorithms cited above, but here's one way to approach it, although it might be roundabout. I am assuming values generated from 0.

R2 = random number generator giving values less than 2 (sample space = {0, 1})
R8 = random number generator giving values less than 8 (sample space = {0, 1, 2, 3, 4, 5, 6, 7})

In order to generate R8 from R2, you will run R2 thrice, and use the combined result of all 3 runs as a binary number with 3 digits. Here are the range of values when R2 is ran thrice:

0 0 0 --> 0
.
.
1 1 1 --> 7

Now to generate R7 from R8, we simply run R7 again if it returns 7:

int R7() {
  do {
    x = R8();
  } while (x > 6)
  return x;
}

The roundabout solution is to generate R2 from R5 (just like we generated R7 from R8), then R8 from R2 and then R7 from R8.

Ashwin 2009-09-17 23:20:37

Answer 28

A:

Why not do it simple?

int random7() {
  return random5() + (random5() % 3);
}

The chances of getting 1 and 7 in this solution is lower due to the modulo, however, if you just want a quick and readable solution, this is the way to go.

Ante 2009-11-09 12:11:16

This does not produce a uniform distribution. This produces the numbers 0-6 with probabilities 2/25, 4/25, 5/25, 5/25, 5/25, 3/25, 1/25, as can be verified by counting all 25 possible outcomes.

Adam Rosenfield 2009-12-05 03:40:23

Answer 29

+1 A:

As long as there aren't seven possibilities left to choose from, draw another random number, which multiplies the number of possibilities by five. In Perl:

$num = 0;
$possibilities = 1;

sub rand7
{
  while( $possibilities < 7 )
  {
    $num = $num * 5 + int(rand(5));
    $possibilities *= 5;
  }
  my $result = $num % 7;
  $num = int( $num / 7 );
  $possibilities /= 7;
  return $result;
}

bmcnett 2009-12-04 23:45:01

Answer 30

A:

The function you need is *rand1_7()*, I wrote rand1_5() so that you can test it and plot it.

import numpy
def rand1_5():
    return numpy.random.randint(5)+1

def rand1_7():
    q = 0
    for i in xrange(7):  q+= rand1_5()
    return q%7 + 1

Andrea Ambu 2009-12-05 07:27:01

Answer 31

A:

There you go, uniform distribution and zero rand5 calls.

def rand7:
    seed += 1
    if seed >= 7:
        seed = 0
    yield seed

Need to set seed beforehand.

Kugel 2009-12-29 01:46:23

Answer 32

A:

Here's a solution that fits entirely within integers and is within about 4% of optimal (i.e. uses 1.26 random numbers in {0..4} for every one in {0..6}). The code's in Scala, but the math should be reasonably clear in any language: you take advantage of the fact that 7^9 + 7^8 is very close to 5^11. So you pick an 11 digit number in base 5, and then interpret it as a 9 digit number in base 7 if it's in range (giving 9 base 7 numbers), or as an 8 digit number if it's over the 9 digit number, etc.:

abstract class RNG {
  def apply(): Int
}

class Random5 extends RNG {
  val rng = new scala.util.Random
  var count = 0
  def apply() = { count += 1 ; rng.nextInt(5) }
}

class FiveSevener(five: RNG) {
  val sevens = new Array[Int](9)
  var nsevens = 0
  val to9 = 40353607;
  val to8 = 5764801;
  val to7 = 823543;
  def loadSevens(value: Int, count: Int) {
    nsevens = 0;
    var remaining = value;
    while (nsevens < count) {
      sevens(nsevens) = remaining % 7
      remaining /= 7
      nsevens += 1
    }
  }
  def loadSevens {
    var fivepow11 = 0;
    var i=0
    while (i<11) { i+=1 ; fivepow11 = five() + fivepow11*5 }
    if (fivepow11 < to9) { loadSevens(fivepow11 , 9) ; return }
    fivepow11 -= to9
    if (fivepow11 < to8) { loadSevens(fivepow11 , 8) ; return }
    fivepow11 -= to8
    if (fivepow11 < 3*to7) loadSevens(fivepow11 % to7 , 7)
    else loadSevens
  }
  def apply() = {
    if (nsevens==0) loadSevens
    nsevens -= 1
    sevens(nsevens)
  }
}

If you paste a test into the interpreter (REPL actually), you get:

scala> val five = new Random5
five: Random5 = Random5@e9c592

scala> val seven = new FiveSevener(five)
seven: FiveSevener = FiveSevener@143c423

scala> val counts = new Array[Int](7)
counts: Array[Int] = Array(0, 0, 0, 0, 0, 0, 0)

scala> var i=0 ; while (i < 100000000) { counts( seven() ) += 1 ; i += 1 }
i: Int = 100000000

scala> counts
res0: Array[Int] = Array(14280662, 14293012, 14281286, 14284836, 14287188,
14289332, 14283684)

scala> five.count
res1: Int = 125902876

The distribution is nice and flat (within about 10k of 1/7 of 10^8 in each bin, as expected from an approximately-Gaussian distribution).

Rex Kerr 2010-03-10 21:25:58

Answer 33

+1 A:

I know it has been answered, but is this seems to work ok, but I can not tell you if it has a bias. My 'testing' suggests it is, at least, reasonable.

Perhaps Adam Rosenfield would be kind enough to comment?

My (naive?) idea is this:

Accumulate rand5's until there is enough random bits to make a rand7. This takes at most 2 rand5's. To get the rand7 number I use the accumulated value mod 7.

To avoid the accumulator overflowing, and since the accumulator is mod 7 then I take the mod 7 of the accumulator:

(5a + rand5) % 7 = (k*7 + (5a%7) + rand5) % 7 = ( (5a%7) + rand5) % 7

The rand7() function follows:

(I let the range of rand5 be 0-4 and rand7 is likewise 0-6.)

int rand7(){
  static int    a=0;
  static int    e=0;
  int       r;
  a = a * 5 + rand5();
  e = e + 5;        // added 5/7ths of a rand7 number
  if ( e<7 ){
    a = a * 5 + rand5();
    e = e + 5;  // another 5/7ths
  }
  r = a % 7;
  e = e - 7;        // removed a rand7 number
  a = a % 7;
  return r;
}

Edit: Added results for 100 million trials.

'Real' rand functions mod 5 or 7

rand5 : avg=1.999802 0:20003944 1:19999889 2:20003690 3:19996938 4:19995539 rand7 : avg=3.000111 0:14282851 1:14282879 2:14284554 3:14288546 4:14292388 5:14288736 6:14280046

My rand7

Average looks ok and number distributions look ok too.

randt : avg=3.000080 0:14288793 1:14280135 2:14287848 3:14285277 4:14286341 5:14278663 6:14292943

philcolbourn 2010-04-19 12:14:52

Answer 34

A:

just scale your output from your first function

0) you have a number in range 1-5
1) subtract 1 to make it in range 0-4
2) multiply by (7-1)/(5-1) to make it in range 0-6
3) add 1 to increment the range: Now your result is in between 1-7

2010-05-27 08:20:17

Sorry, this would only work if you are working with real numbers or doubles etc...Randomizing is a tricky subject!

2010-05-27 08:22:59

Answer 35

A:

rand5() + (rand5() / 2)

Raj 2010-06-28 14:28:11

Answer 36

A:

rand5() + (rand5() & 2)

Adrian 2010-07-22 23:21:27

This won't produce an even distribution.

Dinah 2010-08-13 19:15:27

Answer 37

A:

Simple and efficient:

int rand7 ( void )
{
    return 4; // this number has been calculated using
              // rand5() and is in the range 1..7
}

(Inspired by http://stackoverflow.com/questions/84556/whats-your-favorite-programmer-cartoon/84747#84747).

chiccodoro 2010-08-23 13:14:19

@chiccodoro : I guess all coders have it - A FUNNY BONE ! ... :)

Arkapravo 2010-08-24 04:01:42

If you are going to make a humorous post, at least make it original.

Nixuz 2010-08-24 04:55:57

@Nixuz: Have you never ever cited a joke by someone else? Don't take it so serious, please.

chiccodoro 2010-08-24 07:20:41

Answer 38

A:

I don't like ranges starting from 1, so I'll start from 0 :-)

unsigned rand5()
{
    return rand() % 5;
}

unsigned rand7()
{
    int r;

    do
    {
        r =         rand5();
        r = r * 5 + rand5();
        r = r * 5 + rand5();
        r = r * 5 + rand5();
        r = r * 5 + rand5();
        r = r * 5 + rand5();
    } while (r > 15623);

    return r / 2232;
}

FredOverflow 2010-09-21 15:45:00

Answer 39

A:

function Rand7
   put 200 into x
   repeat while x > 118
      put ((random(5)-1) * 25) + ((random(5)-1) * 5) + (random(5)-1) into x
   end repeat
   return (x mod 7) + 1
end Rand7

Three calls to Rand5, which only repeats 6 times out of 125, on average.

Think of it as a 3D array, 5x5x5, filled with 1 to 7 over and over, and 6 blanks. Re-roll on the blanks. The rand5 calls create a three digit base-5 index into that array.

There would be fewer repeats with a 4D, or higher N-dimensional arrays, but this means more calls to the rand5 function become standard. You'll start to get diminishing efficiency returns at higher dimensions. Three seems to me to be a good compromise, but I haven't tested them against each other to be sure. And it would be rand5-implementation specific.

2010-10-09 06:47:34

ansaurus

tags:

views:

answers:

Given a function which produces a random integer in the range 1 to 5, write a function which produces a random integer in the range 1 to 7

related questions