views:

264

answers:

5

I know similar questions come up a lot and there's probably no definitive answer, but I want to generate five unique random numbers from a subset of numbers that is potentially infinite (maybe 0-20, or 0-1,000,000).
The only catch is that I don't want to have to run while loops or fill an array.

My current method is to simply generate five random numbers from a subset minus the last five numbers. If any of the numbers match each other, then they go to their respective place at the end of the subset. So if the fourth number matches any other number, it will bet set to the 4th from the last number.

Does anyone have a method that is "random enough" and doesn't involve costly loops or arrays?

Please keep in mind this a curiosity, not some mission-critical problem. I would appreciate it if everyone didn't post "why are you having this problem?" answers. I am just looking for ideas.
Thanks a lot!

+3  A: 

Your best option is a loop, as in:

$max = 20;
$numels = 5;
$vals = array();
while (count($vals) < $numels) {
    $cur = rand(0, $max);
    if (!in_array($cur, $vals))
        $vals[] = $cur;
}

For small ranges, you can use array_rand:

$max = 20;
$numels = 5;
$range = range(0, $max);
$vals = array_rand($range, $numels);

You could also generate a number between 0 and max, another between 0 and max-1, ... between 0 and max-4. Then you would sum x to the n-th generated number where x is the number calculated in this fashion:

  • Take the number generated in the n-th iteration and assign it to x
  • if it's larger or equal to that generated in the first iteration, increment it
  • if this new number is larger or equal to that generated (and corrected) in the second iteration, increment it
  • ...
  • if this new number is larger or equal to that generated (and corrected) in the (n-1)-th iteration increment it

The mapping is like this:

1 2 3 4 5 6 7 8 9 (take 4)
1 2 3 4 5 6 7 8 9 (gives 4)

1 2 3 4 5 6 7 8 (take 5)
1 2 3 5 6 7 8 9 (gives 6)

1 2 3 4 5 6 7 (take 6)
1 2 3 5 7 8 9 (gives 8)

1 2 3 4 5 6 (take 5)
1 2 3 5 7 9 (gives 7)

example, last extraction:
x = 5
x >= 4? x == 6
x >= 6? x == 7
x >= 8? x == 7
Artefacto
I think your loop idea is what he is specifically trying to avoid, while your last idea is a proper answer to his question.
VeeArr
I don't think your second method will prevent duplicates. For example, I could generate 10, then 9, and thus I have 10, 10.
erisco
@erisco I've (hopefully) fixed it.
Artefacto
@VeeArr after all it's still sensible answer to a nonsense question.
Col. Shrapnel
@Col. Shrapnel He, maybe he wants something with which to prove total correctness.
Artefacto
@Artefacto: it looks like using one of these methods depending on size would provide a robust solution to the problem, as aioobe pointed out Python already does. thanks for the tips.
tau
I must not understand your new algorithm, because as far as I can tell my case will still happen.
erisco
@erisco you're right, let me see how I can fix it...
Artefacto
@erisco OK, check it now.
Artefacto
I originally had a comment here about a distribution problem, but I have now retracted it. An afterthought revealed to me that figuring out if the distribution is still okay is not as trivial as I thought.
erisco
+4  A: 

One random number call is enough.

If you want to choose a subset of 5 unique numbers in range 1-n, then select a random number in 1 to (n choose r).

Keep a 1-1 mapping from 1 to (n choose r) to the set of possible 5 element subsets, and you are done. This mapping is standard and can be found on the web, for instance here: http://msdn.microsoft.com/en-us/library/aa289166%28VS.71%29.aspx

As an example:

Consider the problem of generating a subset of two numbers from five numbers:

The possible 2 element subset of {1,..., 5} are

1. {1,2}
2. {1,3}
3. {1,4}
4. {1,5}

5. {2,3}
6. {2,4}
7. {2,5}

8. {3,4}
9. {3,5}

10. {4,5}

Now 5 choose 2 is 10.

So we select a random number from 1 to 10. Say we got 8. Now we generate the 8th element in the sequence above: which gives {3,4}, so the two numbers you want are 3 and 4.

The msdn page I linked to, shows you a method to generate the set, given the number. i.e. given 8, it gives back the set {3,4}.

Moron
Interesting answer. Though I can't find an explanation of such 1-1 mapping! Perhaps I can't figure out good search terms. Care to help us out with a url or two in your answer?
aioobe
10000 choose 5 is 832500291625002000, quite bigger than PHP_INT_MAX. +1 for the interesting answer, though.
Artefacto
correction: it's not bigger in 64-bit machines, but my point stands, for 20k it would be.
Artefacto
@aioobe: I have added a link.
Moron
@Artefacto: Yes, n choose 5 probably requires 4 more registers than n, but in the worst case, we call the random number generator 5 times (i.e work with multiple ints), which might still work out better than trying to loop and eliminate dupes. Thanks for pointing that out though. Even though OP only asked for ideas, it is good to point out possible traps in using that idea.
Moron
Fun fact: http://www.wolframalpha.com/input/?i=Binomial%5Bx%2C5%5D+%3E+2%5E63-1%2C+Binomial%5Bx-1%2C5%5D+%3C+2%5E63-1
Artefacto
@Artefacto: Nice link :-)
Moron
@Artefacto: btw, if we use two 64 bit integers, we can cater to generating a random 5 element subset of {1,... 115344834}. So only 2 random number calls are needed for that. Adding one more int will probably square this number, so I would say this method is not that bad inspite of the huge numbers involved.
Moron
@Moron: could you demonstrate how to apply this? your math is clearly more powerful than mine.
tau
@tau: I used the link which Artefacto was kind enough to provide. Any mistake, blame _him_ :-)
Moron
@tau: You were talking about my comment about 115344834, but not the answer, right?
Moron
@Moron: sorry, no, i meant your answer. thanks.
tau
@tau: I have edited the answer to add some explanation. Let me know if you need more clarification.
Moron
@Moron: that explanation is the push i needed. thanks a lot!
tau
@tau: You are most welcome :-)
Moron
A: 

Since you are just looking for different ideas here's one:

Call out to Random.org to generate the set of random numbers you need.

Robert
...and yes I realize if you are not willing to eat the cost of a while loop you are most likely not willing to make a call across a network for your numbers, but you did say you wanted different ideas.
Robert
believe me, i do appreciate the variety of at least proposing a novel solution hehe.
tau
+2  A: 

The general form of this question is really interesting. Should one select from a pool of elements (and remove them from the pool) or should one loop "while hitting" an already taken element?

As far as I can tell, the python library implementation for random.sample chooses at runtime between the two methods depending on the proportion of the size of the input list and the number of elements to select.

A comment from the source code:

    # When the number of selections is small compared to the
    # population, then tracking selections is efficient, requiring
    # only a small set and an occasional reselection.  For
    # a larger number of selections, the pool tracking method is
    # preferred since the list takes less space than the
    # set and it doesn't suffer from frequent reselections.

In the specific instance that the OP mentions however (selecting 5 numbers), I think that looping "while hitting a taken number" is ok, unless the pseudo random generator is broken.

aioobe
interesting to know this. thanks!
tau
A: 

If you know the size N then keep each number with probability 5/N generate a random number between 0 and 1 and if it is less than 5/N keep the item. Stop when we have 5 items.

If we don't know N use resorvoir sampling.

Fakrudeen
could you please provide an example? im not sure i follow.
tau
Look for resorvoir sampling in Knuth's Volume 2 and please read that section. That section contains these 2 algorithms.
Fakrudeen