views:

9531

answers:

24

Duplicate:

Unique random numbers in O(1)?

I want an pseudo random number generator that can generate numbers with no repeats in a random order.

For example:

random(10)

might return 5, 9, 1, 4, 2, 8, 3, 7, 6, 10

Is there a better way to do it other than making the range of numbers and shuffling them about, or checking the generated list for repeats?


Edit:

Also I want it to be efficient in generating big numbers without the entire range.


Edit:

I see everyone suggesting shuffle algorithms. But if I want to generate large random number (1024 byte+) then that method would take alot more memory than if I just used a regular RNG and inserted into a Set until it was a specified length, right? Is there no better mathematical algorithm for this.

+14  A: 

In order to ensure that the list doesn't repeat, it would have to keep a list of numbers previously returned. As it has to therefore generate the entire list by the end of the algorithm, this is equivalent in storage requirement to generating the ordered list and then shuffling.

More about shuffling here: http://stackoverflow.com/questions/168037/creating-a-random-ordered-list-from-an-ordered-list

However, if the range of the random numbers is very large but the quantity of numbers required is small (you've hinted that this is the actual requirement in a comment), then generate a complete list and shuffling it is wasteful. A shuffle on a huge array involves accessing pages of virtual memory in a way that (by definition) will defeat the OS's paging system (on a smaller scale the same problem would occur with the CPU's memory cache).

In this case, searching the list-so-far will be much more efficient. So the ideal would be to use heuristics (determined by experiment) to pick the right implementation for the given arguments. (Apologies for giving examples in C# rather than C++ but ASFAC++B I'm training myself to think in C#).

IEnumerable<int> GenerateRandomNumbers(int range, int quantity)
{
    int[] a = new int[quantity];

    if (range < Threshold)
    {
        for (int n = 0; n < range; n++)
            a[n] = n;

        Shuffle(a);
    }
    else
    {
        HashSet<int> used = new HashSet<int>();

        for (int n = 0; n < quantity; n++)
        {
            int r = Random(range);

             while (!used.Add(r))
                 r = Random(range);

             a[n] = r;
        }
    }

    return a;
}

The cost of doing the checking for repeated numbers, the looping while there are collisions, etc. will be expensive, but there will likely be some Threshold value where it becomes faster than allocating for the entire range.

For sufficiently small quantity requirements, it may be faster to use an array for used and do linear searches in it, due to the greater locality, lower overhead, the cheapness of the comparison...

Also for large quantities AND large ranges, it might be preferable to return an object that produces the numbers in the sequence on request, instead of allocating the array for the results upfront. This is very easy to implement in C# thanks to the yield return keyword:

IEnumerable<int> ForLargeQuantityAndRange(int quantity, int range)
{
    for (int n = 0; n < quantity; n++)
    {
        int r = Random(range);

        while (!used.Add(r))
            r = Random(range);

        yield return r;
    }
}
Daniel Earwicker
Check out random_shuffle in <algorithm> for a clean C++ implementation of the Knuth-Fisher-Yates shuffle, provided you've populated your list of values into a RandomAccessIterator compatible container.
Jeremy CD
The first sentence is not true. A properly set up implementation of a linear feedback shift register will go through all the values in its range precisely once before it comes back to the beginning.
Steve Melnikoff
You're right, I'm assuming the source of randomness is a function taking the required range, which may be unnecessary. But I guess the problem with LFSR is scaling the output to a specified range without causing collisions?
Daniel Earwicker
+10  A: 

A shuffle is a perfectly good way to do this (provided you do not introduce a bias using the naive algorithm). See Fisher-Yates shuffle.

Mitch Wheat
Would the downvoter (to this old answer) please leave a comment. Thanks.
Mitch Wheat
A: 

Shuffling N elements doesn't take up excessive memory...think about it. You only swap one element at a time, so the maximum memory used is that of N+1 elements.

Jason Punyon
Say I want to generate 100 numbers from 1-2^64 bits, if I shuffle N elements, I need 2^64 numbers that are 64 bits long. If I use the other solution, I only need storage for about 100 64-bit numbers.
Unknown
@unknown (yahoo): That isn't quite the question you asked.
Jason Punyon
+4  A: 

A shuffle is the best you can do for random numbers in a specific range with no repeats. The reason that the method you describe (randomly generate numbers and put them in a Set until you reach a specified length) is less efficient is because of duplicates. Theoretically, that algorithm might never finish. At best it will finish in an indeterminable amount of time, as compared to a shuffle, which will always run in a highly predictable amount of time.


Response to edits and comments:

If, as you indicate in the comments, the range of numbers is very large and you want to select relatively few of them at random with no repeats, then the likelihood of repeats diminishes rapidly. The bigger the difference in size between the range and the number of selections, the smaller the likelihood of repeat selections, and the better the performance will be for the select-and-check algorithm you describe in the question.

Bill the Lizard
But if I want to generate 100 numbers that can range from 1 - 2^64 bits, it doesn't make sense to create a list of 2^64 numbers and then do a random shuffle from that.
Unknown
No, it doesn't. In that case the likelihood of collisions would be greatly reduced. The example in your question is quite misleading if that's the kind of sample and range you need.
Bill the Lizard
+2  A: 

If you don't mind mediocre randomness properties and if the number of elements allows it then you could use a linear congruential random number generator.

starblue
Do you know how this compares to the LSFR in terms of randomness? Voted you up for the algorithm. I'm kind of worried about the fact that the numbers will lie in the "m1/n hyperplanes".
Unknown
LCGs have some serious problems, but the hyperplane issues are present in _all_ PRNGs. It depends on the algorithm, state word size and good choice of parameters at which dimensions this occurs and how many hyperplanes there are. E. g. The Mersenne Twister yield points in 624-dimensional hyperplanes
Joey
Ah I see, thank you for explaining that all PRNGS have that limitation.
Unknown
+4  A: 

If a random number is guaranteed to never repeat it is no longer random and the amount of randomness decreases as the numbers are generated (after nine numbers random(10) is rather predictable and even after only eight you have a 50-50 chance).

Motti
+12  A: 

You may be interested in a linear feedback shift register. We used to build these out of hardware, but I've also done them in software. It uses a shift register with some of the bits xor'ed and fed back to the input, and if you pick just the right "taps" you can get a sequence that's as long as the register size. That is, a 16-bit lfsr can produce a sequence 65535 long with no repeats. It's statistically random but of course eminently repeatable. Also, if it's done wrong, you can get some embarrassingly short sequences. If you look up the lfsr, you will find examples of how to construct them properly (which is to say, "maximal length").

gbarry
Do you know how this compares to the linear congruential RNG in terms of randomness? Voted you up for the algorithm.
Unknown
Sorry, not a math guy. I can't see any similarity between them (though I instinctively suspect it).
gbarry
Preferrably you shouldn't try to build a RNG frmo scratch. Too much can go wrong. If you already got a decent RNG you can use (in Boost or so) you should by all means use that. Nearly all methods where period = modulus yield that many non-repeating numbers but that seems to be useless for you anyway
Joey
LFSRs are a very effective and simple solution - but you must pick a suitable constants for them to work properly. There are resources on the web to help with that.
Steve Melnikoff
+3  A: 

I understand tou don't want a shuffle for large ranges, since you'd have to store the whole list to do so.

Instead, use a reversible pseudo-random hash. Then feed in the values 0 1 2 3 4 5 6 etc in turn.

There are infinite numbers of hashes like this. They're not too hard to generate if they're restricted to a power of 2, but any base can be used.

Here's one that would work for example if you wanted to go through all 2^32 32 bit values. It's easiest to write because the implicit mod 2^32 of integer math works to your advantage in this case.

unsigned int reversableHash(unsigned int x)
{
   x*=0xDEADBEEF;
   x=x^(x>>17);
   x*=0x01234567;
   x+=0x88776655;
   x=x^(x>>4);
   x=x^(x>>9);
   x*=0x91827363;
   x=x^(x>>7);
   x=x^(x>>11);
   x=x^(x>>20);
   x*=0x77773333;
   return x;
}
SPWorley
What does reversable mean in this context?
Daniel Earwicker
No idea, I was expecting "n == f(f(n))" but taht is definitely not the case.
Vatine
Reversable here means that no information is lost. If you have y=f(x), there's a reverse function x=g(y). This guarantees you have a 1-to-1 mapping and therefore your pseudorandom sequence will not repeat.
SPWorley
+1  A: 

What about using GUID generator (like in the one in .NET). Granted it is not guaranteed that there will be no duplicates, however the chance getting one is pretty low.

Rashack
As far as I know the current time is part of A GUID generation. Therefore it has certainly a correlation between numbers.
dmeister
A: 
Nathan Fellman
A: 

If you can generate 'small' random numbers, you can generate 'large' random numbers by integrating them: add a small random increment to each 'previous'.

const size_t amount = 100; // a limited amount of random numbers
vector<long int> numbers; 
numbers.reserve( amount );
const short int spread = 250; // about 250 between each random number
numbers.push_back( myrandom( spread ) );
for( int n = 0; n != amount; ++n ) {
    const short int increment = myrandom( spread );
    numbers.push_back( numbers.back() + increment );
}

myshuffle( numbers );

The myrandom and myshuffle functions I hereby generously delegate to others :)

xtofl
+1  A: 

This has been asked before - see my answer to the previous question. In a nutshell: You can use a block cipher to generate a secure (random) permutation over any range you want, without having to store the entire permutation at any point.

Nick Johnson
A: 

Actually, there's a minor point to make here; a random number generator which is not permitted to repeat is not random.

Blank Xavier
A: 

If you want to creating large (say, 64 bits or greater) random numbers with no repeats, then just create them. If you're using a good random number generator, that actually has enough entropy, then the odds of generating repeats are so miniscule as to not be worth worrying about.

For instance, when generating cryptographic keys, no one actually bothers checking to see if they've generated the same key before; since you're trusting your random number generator that a dedicated attacker won't be able to get the same key out, then why would you expect that you would come up with the same key accidentally?

Of course, if you have a bad random number generator (like the Debian SSL random number generator vulnerability), or are generating small enough numbers that the birthday paradox gives you a high chance of collision, then you will need to actually do something to ensure you don't get repeats. But for large random numbers with a good generator, just trust probability not to give you any repeats.

Brian Campbell
A: 

Suppose you wanted to generate a series of 256 random numbers without repeats.

  1. Create a 256-bit (32-byte) memory block initialized with zeros, let's call it b
  2. Your looping variable will be n, the number of numbers yet to be generated
  3. Loop from n = 256 to n = 1
  4. Generate a random number r in the range [0, n)
  5. Find the r-th zero bit in your memory block b, let's call it p
  6. Put p in your list of results, an array called q
  7. Flip the p-th bit in memory block b to 1
  8. After the n = 1 pass, you are done generating your list of numbers

Here's a short example of what I am talking about, using n = 4 initially:

**Setup**
b = 0000
q = []

**First loop pass, where n = 4**
r = 2
p = 2
b = 0010
q = [2]

**Second loop pass, where n = 3**
r = 2
p = 3
b = 0011
q = [2, 3]

**Third loop pass, where n = 2**
r = 0
p = 0
b = 1011
q = [2, 3, 0]

** Fourth and final loop pass, where n = 1**
r = 0
p = 1
b = 1111
q = [2, 3, 0, 1]
William Brendel
+1  A: 

As you generate your numbers, use a Bloom filter to detect duplicates. This would use a minimal amount of memory. There would be no need to store earlier numbers in the series at all.

The trade off is that your list could not be exhaustive in your range. If your numbers are truly on the order of 256^1024, that's hardly any trade off at all.

(Of course if they are actually random on that scale, even bothering to detect duplicates is a waste of time. If every computer on earth generated a trillion random numbers that size every second for trillions of years, the chance of a collision is still absolutely negligible.)

Imbue
A: 

Please check answers at

http://stackoverflow.com/questions/515665/generate-sequence-of-integers-in-random-order-without-constructing-the-whole-list/515704#515704

and also my answer lies there as

 very simple random is 1+((power(r,x)-1) mod p) will be from 1 to p for values of x from 1 to p and will be random where r and p are prime numbers and r <> p.
lakshmanaraj
+1  A: 

I second gbarry's answer about using an LFSR. They are very efficient and simple to implement even in software and are guaranteed not to repeat in (2^N - 1) uses for an LFSR with an N-bit shift-register.

There are some drawbacks however: by observing a small number of outputs from the RNG, one can reconstruct the LFSR and predict all values it will generate, making them not usable for cryptography and anywhere were a good RNG is needed. The second problem is that either the all zero word or the all one (in terms of bits) word is invalid depending on the LFSR implementation. The third issue which is relevant to your question is that the maximum number generated by the LFSR is always a power of 2 - 1 (or power of 2 - 2).

The first drawback might not be an issue depending on your application. From the example you gave, it seems that you are not expecting zero to be among the answers; so, the second issue does not seem relevant to your case. The maximum value (and thus range) problem can solved by reusing the LFSR until you get a number within your range. Here's an example:

Say you want to have numbers between 1 and 10 (as in your example). You would use a 4-bit LFSR which has a range [1, 15] inclusive. Here's a pseudo code as to how to get number in the range [1,10]:

x = LFSR.getRandomNumber();
while (x > 10) {
   x = LFSR.getRandomNumber();
}

You should embed the previous code in your RNG; so that the caller wouldn't care about implementation. Note that this would slow down your RNG if you use a large shift-register and the maximum number you want is not a power of 2 - 1.

gsarkis
A: 

I asked a similar question before but mine was for the whole range of a int see Looking for a Hash Function /Ordered Int/ to /Shuffled Int/

David Allan Finch
A: 

You need a 1024 byte number? just generate it, take the best byte random number generator and run it 1024 times. You have better chances that you computer will turn into human then that a collision will happen.

Shay Erlichmen
A: 
static std::unordered_set<long> s;
long l = 0;
for(; !l && (s.end() != s.find(l)); l = generator());
v.insert(l);

generator() being your random number generator. You roll numbers as long as the entry is not in your set, then you add what you find in it. You get the idea.

I did it with long for the example, but you should make that a template if your PRNG is templatized.

Alternative is to use a cryptographically secure PRNG that will have a very low probability to generate twice the same number.

Edouard A.
A: 

If you don't mean poor statisticall properties of generated sequence, there is one method:

Let's say you want to generate N numbers, each of 1024 bits each. You can sacrifice some bits of generated number to be "counter".

So you generate each random number, but into some bits you choosen you put binary encoded counter (from variable, you increase each time next random number is generated).

You can split that number into single bits and put it in some of less significant bits of generated number.

That way you are sure you get unique number each time.

I mean for example each generated number looks like that: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxyyxxxxyxyyyyxxyxx where x is take directly from generator, and ys are taken from counter variable.

A: 

Mersenne twister

Description of which can be found here on Wikipedia: Mersenne twister

Look at the bottom of the page for implementations in various languages.

Indeera
A: 

The problem is to select a "random" sequence of N unique numbers from the range 1..M where there is no constraint on the relationship between N and M (M could be much bigger, about the same, or even smaller than N; they may not be relatively prime).

Expanding on the linear feedback shift register answer: for a given M, construct a maximal LFSR for the smallest power of two that is larger than M. Then just grab your numbers from the LFSR throwing out numbers larger than M. On average, you will throw out at most half the generated numbers (since by construction more than half the range of the LFSR is less than M), so the expected running time of getting a number is O(1). You are not storing previously generated numbers so space consumption is O(1) too. If you cycle before getting N numbers then M less than N (or the LFSR is constructed incorrectly).

You can find the parameters for maximum length LFSRs up to 168 bits here (from wikipedia): http://www.xilinx.com/support/documentation/application_notes/xapp052.pdf

Here's some java code:

/** * Generate a sequence of unique "random" numbers in [0,M) * @author dkoes * */

public class UniqueRandom { long lfsr; long mask; long max;

private static long seed = 1;
//indexed by number of bits
private static int [][] taps = {
        null, // 0
        null, // 1
        null, // 2
        {3,2}, //3
        {4,3},
        {5,3},
        {6,5},
        {7,6},
        {8,6,5,4},
        {9,5},
        {10,7},
        {11,9},
        {12,6,4,1},
        {13,4,3,1},
        {14,5,3,1},
        {15,14},
        {16,15,13,4},
        {17,14},
        {18,11},
        {19,6,2,1},
        {20,17},
        {21,19},
        {22,21},
        {23,18},
        {24,23,22,17},
        {25,22},
        {26,6,2,1},
        {27,5,2,1},
        {28,25},
        {29,27},
        {30,6,4,1},
        {31,28},
        {32,22,2,1},
        {33,20},
        {34,27,2,1},
        {35,33},
        {36,25},
        {37,5,4,3,2,1},
        {38,6,5,1},
        {39,35},
        {40,38,21,19},
        {41,38},
        {42,41,20,19},
        {43,42,38,37},
        {44,43,18,17},
        {45,44,42,41},
        {46,45,26,25},
        {47,42},
        {48,47,21,20},
        {49,40},
        {50,49,24,23},
        {51,50,36,35},
        {52,49},
        {53,52,38,37},
        {54,53,18,17},
        {55,31},
        {56,55,35,34},
        {57,50},
        {58,39},
        {59,58,38,37},
        {60,59},
        {61,60,46,45},
        {62,61,6,5},
        {63,62},
};

//m is upperbound; things break if it isn't positive
UniqueRandom(long m)
{
    max = m;
    lfsr = seed; //could easily pass a starting point instead
    //figure out number of bits
    int bits = 0;
    long b = m;
    while((b >>>= 1) != 0)
    {
        bits++;
    }
    bits++;

    if(bits < 3)
        bits = 3; 

    mask = 0;
    for(int i = 0; i < taps[bits].length; i++)
    {
        mask |= (1L << (taps[bits][i]-1));
    }

}

//return -1 if we've cycled
long next()
{
    long ret = -1;
    if(lfsr == 0)
        return -1;
    do {
        ret = lfsr;
        //update lfsr - from wikipedia
        long lsb = lfsr & 1;
        lfsr >>>= 1;
        if(lsb == 1)
            lfsr ^= mask;

        if(lfsr == seed)            
            lfsr = 0; //cycled, stick

        ret--; //zero is stuck state, never generated so sub 1 to get it
    } while(ret >= max);

    return ret;
}

}

dkoes