views:

142

answers:

3

I am curious to know what the best way to generate a random integer R that is not in a provided set of integers (R∉N). I can think of several ways of doing this but I'm wondering what you all think.

A: 

Generate a random number R in the entire domain (subtract the size of N from the max value) that you want to use. Then loop through all N less than R and for each add 1 to R. This will give a random number in the domain that is not in N.

murgatroid99
That is not true random generation, because it will be weighted towards edge numbers - numbers that are near the elements in N.
aperkins
No it will not because it adds 1 to R whether or not there is a conflict.
murgatroid99
Yes, which is why it will end up around the edges - your R will be more likely to be the one right above one in the set of N than not - twice as likely. To hit a number right above another would become 2/P, where P is the entire set of numbers available, and all other numbers would be 1/P. If you had numbers in a row - say 2,3,4 - in the set, then for every number in a row, the chance of hitting the one above it (4 in this example) goes up by 1/P per number in a row after the first (4/P in my example).
aperkins
Actually, murgatroid99 is right. Imagine the set of numbers not in N. Its size is _K - N_. Now, generate a random index for this set. Finally, you need to count it off, and that is just jumping the holes, which is exactly what murgatroid99 is proposing.
Svante
It should be written more clearly, though; perhaps: "Loop through the ascendingly sorted set N, adding 1 to R for each element, until the next element is bigger than R."
Svante
@Svante: The problem is, if you want it really random, to do that method you must have the set of all possible elements - i.e. all that are not included in the set N. He will still have the problem of clustering around the front edges of the number sets, as it is not stated in the question from the op that the numbers are sequential. If the numbers are sequential then, yes, his answer could be correct.
aperkins
If the numbers were not sequential, then the question would not make much sense: You would need to store them element by element anyway, and then doing a set-difference and generating a random index is trivial.
Svante
+9  A: 

Let N be the size of the overall set, and let K be the size of the excluded set.

I depends on the size of the set you are sampling from. If the excluded set is much smaller than the overall range, just choose a random number, and if it is in the excluded set, choose again. If we keep the excluded set in a hash table each try can be done in O(1) time.

If the excluded set is large, choose a random number R in a set of size (N - K) and output the choice as the member of the non excluded elements. If we store just the holes in a hash table keyed with the value of the random number we can generate this in one sample in time O(1).

The cutoff point will depend on the size of (N - K)/N, but I suspect that unless this is greater than .5 or so, or you sets are very small, just sampling until you get a hit will be faster in practice.

deinst
+1 - showed both the selective and restrictive cases.
aperkins
wow very good solutions
Alan
Good answer in principle, but I would not be sure about that 0.9 threshold---I think it might be significantly lower in many cases and would begin to feel quite uneasy at about 0.5. I guess you can only test it in your application, but when in doubt, I would prefer the deterministic runtime.
Svante
There's no reason to use a tree for this. If you don't know what all the holes are, sampling is faster than building a tree. If you do know, you should have built a hash table using that time instead of a tree.
chuck taylor
@Svante: Random running time is not something to be scared of... even with a factor of 0.9, the expected number of trials is only 10.
Eyal Schneider
@chuck yes. I'll fix it. Doh!
deinst
@Svante the .9 is probably larger than practicable, particularly now that I realized that it no longer takes O(lg n), but random algorithms with non trivial failure rates are quite practicable.
deinst
Does "storing the holes" mean storing each of the numbers inside the negative sets as a separate item of the list? This will really allow O(1) random generation time, but building it is O(N) space and time...
Eyal Schneider
As I see it, you should store the valid ranges in a sorted array. This requires Klog K time, because the number of valid ranges is no more than K + 1. But this is only useful as a pre-process step! now you can generate a random number in O(log K) time.
Eyal Schneider
@Eyal yes it does take O(N lg(N)) time and O(n) space to construct (you need to sort). You other choice is storing the original list sorted which still takes O(n) space and O(n lg n) time.I suppose you could run through the entire list on each random generation if you were not doing many,
deinst
@deinst: Time complexity dependent on N is not optimal, since I assume that N may be very large (e.g. all integers). What I suggested in my comment is a solution that its time complexity depends only on K.
Eyal Schneider
@deinst: please ignore my comments... I had in mind the idea that N itself is large and continous, and this is not necessarily the case. If it were so, then my solution becomes very relevant.
Eyal Schneider
@deinst: returning to the original problem description, I still believe your solution is not optimal. In O(N) time and space you can build an (unsorted) array of all legal values, by using a hashtable for the restricted values. This also allows O(1) time for the random number generation.
Eyal Schneider
@Eyal: Yes, you are correct. I'll fix it after I get home. Thanks.
deinst
I implemented a hybrid solution which includes many of the cases noted above and it works very nicely even for large set sizes! Thanks for all the great suggestions!!
Alan
+1  A: 

Given your limited description? Find the maximum value of the elements in N. Generate only random numbers greater than that.

Nick Johnson