ansaurus

Question

Answer 1

+6 A:

Build a list with indexes that point to true values and select one of those at random. Requires O(n) for list traversal and one try for the random number.

Joey 2009-12-03 15:27:31

That certainly is faster than what I came up with, though it uses O(n) working space, where mine uses only constant working space. So there still might be room for improvement.

momeara 2009-12-03 15:34:15

Is it certainly faster? If true values are very rare, then it's almost certainly faster. If false values are very rare, then it's almost certainly slower. Where the break-even point is, I don't know.

Steve Jessop 2009-12-03 15:45:26

Yes, certainly the distribution of true/false values does matter for the question which algorithm is more efficient. But when that's not known all bets are off, as usual. Still, I find Jon's answer very nice and likely to be better than this.

Joey 2009-12-03 15:52:08

@Steve: It really depends on how often you need to do it. Building the index only happens once, so even if false values are extremely rare, using the index will eventually be faster.

Jerry Coffin 2009-12-03 16:13:13

Yes, I hadn't thought of that: there may be guarantees that a certain number of selections will be done before the next time the contents of the array are changed. If we're allowed to widen the scope a little, then it might be better to get rid of the array of flags entirely, and just keep a structure containing the "true" indices. Insertion/removal will be slower, but selection will be faster.

Steve Jessop 2009-12-03 17:14:05

Answer 2

+5 A:

Use the "pick a random element from an infinite list" algorithm.

Keep an index of your current pick, and also a count of how many true values you've seen.

When you see a true value, increment the count and then replace your pick with the current index with a probability of P=(1/count). (So you always pick the first one you find... then you might switch to the second one, with probability 1/2, then you might switch to the third one with probabilty 1/3 etc.)

This requires only one scan over the list and constant storage. (It does require you to work out a larger number of random numbers, however.) In particular, it doesn't ever require you to either buffer the list or go back to the start - so it can work on an unbounded input stream.

See this answer for a sample LINQ implementation of the simple "pick a random element" algorithm; it would just need minor tweaks.

Jon Skeet 2009-12-03 15:32:06

A few more details and a proof here: http://stackoverflow.com/questions/1133942/what-is-the-most-efficient-way-to-pick-a-random-card-from-a-deck-when-some-cards/1134286#1134286. This question is functionally a duplicate of that one, although phrased a bit differently. My instinct is that it will most likely be slower than the two-pass algorithm, assuming data in memory. But worth testing, if the two-pass performance is unacceptable for whatever reason.

Steve Jessop 2009-12-03 15:37:42

@Steve: It depends on the sparseness of "true" values vs cost of generating a random number. If you have a million entries in the list, only 2 of which are "true", then this is likely to be a win. If, on the other hand, you have a million entries *all* of which are true, the two-pass algorithm will probably be faster. In general I just like the elegance of one-pass constant storage algorithms :)

Jon Skeet 2009-12-03 15:44:50

Heh, I just made the same comment about sparseness on Johannes' answer. I agree about elegance, too, although I worry slightly that using a lot of random numbers makes it harder to analyse the effects of any weaknesses in the RNG.

Steve Jessop 2009-12-03 15:50:05

Thanks Jon and Steve, I suspected there was some technique like this but I wasn't aware of it before!

momeara 2009-12-04 05:57:16

ansaurus

tags:

views:

answers:

Fast random selection algorithm

related questions