views:

427

answers:

5

What are some alternative methods to generate 1000 distinct random integers in the range [0,8000] as opposed to the following:

  1. naive method: generating a number and checking if it's already in the array. O(n^2)
  2. linear shuffle: generate sequence 0 to 8000, shuffle, take the first 1000. O(n)
+12  A: 

You can use a partial Fisher-Yates shuffle implemented using swaps. One of the nice features of this algorithm is that if you stop after k swaps, the first k numbers are a random sample of size k from the complete set.

Mark Byers
+1. I was going to suggest just shuffling the whole array; for some reason this obvious optimisation didn't occur to me.
Nick Johnson
Problem is that you still need to create an array from [0,8000]. Can you create a permutation without the overhead? What if you wanted 10 unique numbers between 1..1,000,000?
Ray
Technically you could make it a sparse array implemented as a dict instead, and any position which doesn't have a value set is simply taken to be equal to its index.
Amber
@Dav: Good suggestion. This would save the O(n) construction of the list of integers from [0, 8000].
Mark Byers
I don't recall mentioning anything about a bitarray?
Amber
@Dav: Corrected. :P
andras
@Dav: (great idea: was doing something along those lines for this and in general, k-permutations: http://stackoverflow.com/questions/2394246/algorithm-to-select-a-single-random-combination-of-values/2394308#2394308)
andras
+2  A: 

You could create a list containing the numbers 0 to 8000.

Then looping 1000 times generate a random number between 0 and the length of the list.

Remove that element from the list and add it to an output list.

By removing the element you ensure that your selections are unique.

while (outputList.Count < 1000)
{
    index = random.Next(0, inputList.Count);
    outputList.Add(inputList[index]);
    inputList.RemoveAt(index);
}
ChrisF
This will run in O(n^2) since RemoveAt for a vector or array is O(n).
Dave Kirby
+1  A: 

This is from Knuth's the Art of Programming (via Jon Bentley's Programming Pearls), implemented in Python:

import random

# randomly select m numbers from n candidates    
def random_select(m, n):
    select = m
    result = []
    for i in xrange(n):
        if random.randint(0, n-i) < select:
            result.append(i)
            select -= 1
    return result

random_select(1000, 8000)

this will generate a list of random numbers in numerical order. It works by iterating over all the integers from 0-n (i.e 0-8000), and randomly selecting them with a probability of(number left to select / number of remaining candidates). It runs in O(n), so do not try it if n is very large compared to m - e.g. selecting ten numbers out of a billion. It uses no memory other than the result list (m) and a few local variables, unlike solutions that rely on shuffling a list of length n.

If you want the result in a random order then shuffle the list afterwards.

Dave Kirby
The Pythonic variant: import random; random.sample(xrange(8000), 1000)
Ants Aasma
+1  A: 

Partial Fisher-Yates, as @Mark has suggested, with a little twist, storing the swaps along the way.
This way, it will at most consume as much memory as the result list O(m).
It will also run in O(m) - not O(n), as other solutions that enumerate the whole range - so it should not have problems on larger ranges.
This way, you can have the best of both worlds.

/// <summary>
/// Generates unique random numbers
/// <remarks>
/// Worst case memory usage is O(min((emax-imin)/2, num))
/// </remarks>
/// </summary>
/// <param name="random">Random source</param>
/// <param name="imin">Inclusive lower bound</param>
/// <param name="emax">Exclusive upper bound</param>
/// <param name="num">Number of integers to generate</param>
/// <returns>Sequence of unique random numbers</returns>
public static IEnumerable<int> UniqueRandoms(
    Random random, int imin, int emax, int num)
{
    int dictsize = num;
    long half = (emax - (long)imin + 1) / 2;
    if (half < dictsize)
        dictsize = (int)half;
    Dictionary<int, int> trans = new Dictionary<int, int>(dictsize);
    for (int i = 0; i < num; i++)
    {
        int current = imin + i;
        int r = random.Next(current, emax);
        int right;
        if (!trans.TryGetValue(r, out right))
        {
            right = r;
        }
        int left;
        if (trans.TryGetValue(current, out left))
        {
            trans.Remove(current);
        }
        else
        {
            left = current;
        }
        if (r > current)
        {
            trans[r] = left;
        }
        yield return right;
    }
}
andras
A: 

Sorted list with no sort, O(n)

If you want the integers sorted, I got to this answer in another question with a lot of help. You can do it using an exponential variate and thereby avoid any sort. As a result it is O(n):

From Alok's answer and Dan Dyer's comment it turns out that using an exponential distribution for a set of deltas gives a uniform distribution of integers in sequence.

So, you just start generating numbers and then scale them at the end. Adding 1 to the delta ensures you never repeat a value.

import random,sys,math

def genSortedInts(mini,maxi,vals):
    running = 0
    deltas = [random.expovariate(1.0) for i in range(0,vals+1)]
    floats = []
    for d in deltas:
        running += d
        floats.append(running)
    upper = floats.pop()
    valRange = maxi-mini-(vals-1)
    ints = [mini+int(f/upper*valRange)+id for id,f in enumerate(floats)]
    return ints

if __name__ == "__main__":
    vals = 10
    maxi = 80
    mini = 0
    print(genSortedInts(mini,maxi,vals))

Note the use of random.expovariate(1.0), a Python exponential distribution random number generator (very useful!). Here it's called with a mean of 1.0 (arg is 1/mean), but since the script normalises against the last number in the sequence, the mean itself doesn't matter.

Output (fair dice roll) for 10 values up to 80:

[3, 5, 10, 16, 25, 37, 41, 45, 57, 70]
Phil H