ansaurus

Question

Choose random array element satisfying certain property

Answer 1

+6 A:

Yes, I believe so.

The first time you encounter a matching element, you definitely pick it. The next time, you pick the new value with a probability of 1/2, so each of the two elements have an equal chance. The following time, you pick the new value with a probability of 1/3, leaving each of the other elements with a probability of 1/2 * 2/3 = 1/3 as well.

I'm trying to find a Wikipedia article about this strategy, but failing so far...

Note that more generally, you're just picking a random sample out of a sequence of unknown length. Your sequence happens to be generated by taking an initial sequence and filtering it, but the algorithm doesn't require that part at all.

I thought I'd got a LINQ operator in MoreLINQ to do this, but I can't find it in the repository... EDIT: Fortunately, it still exists from this answer:

public static T RandomElement<T>(this IEnumerable<T> source,
                                 Random rng)
{
    T current = default(T);
    int count = 0;
    foreach (T element in source)
    {
        count++;
        if (rng.Next(count) == 0)
        {
            current = element;
        }            
    }
    if (count == 0)
    {
        throw new InvalidOperationException("Sequence was empty");
    }
    return current;
}

Jon Skeet 2009-06-08 17:59:51

Jon, as far as I see it, this algorithm will ALWAYS pick the first element that meets p, what I am missing?

tekBlues 2009-06-08 18:10:46

@tekBlues: it keeps going after it's picked the first one.

AakashM 2009-06-08 18:13:43

I'm pretty sure this algorithm works, if the random generator does its job properly.

ammoQ 2009-06-08 18:14:20

aakashm, he says he only wants to pick one element

tekBlues 2009-06-08 18:22:24

Instead of "picks", think of the algorithm as "holding" elements. It holds the first one, then might drop it to hold the second one, etc. At the end, you're left holding some element, which is the one that you return.

Nate Kohl 2009-06-08 18:23:17

@tekBlues: It picks the first one, then potentially replaces its pick with the next matching one, then potentially replaces it again etc. Note that it *doesn't* return the moment it's picked one - it returns when it's gone through the whole sequence.

Jon Skeet 2009-06-08 18:33:34

Answer 2

A:

For clarity's sake, I would try:

pickRandElement(elements, p)
     OrderedCollection coll = new OrderedCollection
     foreach element in elements
          if (p(element))
               coll.add(element)
     if (coll.size == 0) return null
     else return coll.get(randInt(coll.size))

To me, that makes it MUCH clearer what you're trying to do and is self-documenting. On top of that, it's simpler and more elegant, and it's now obvious that each will be picked with an even distribution.

JoeCool 2009-06-08 18:12:22

That's the code we have now (and I admit it is more clear). I'm hoping for something more efficient. Creating a list and adding elements to it, I'm guessing, is somewhat inefficient and wasteful, if the proposed alternative will work.

Paul Reiners 2009-06-08 18:19:39

Once you see what the proposed algorithm is doing, it's extremely elegant IMO.

Jon Skeet 2009-06-08 18:34:51

Yes, you're right, if efficiency is the top priority. I would say that the more obvious solution that I provide is more readable though.

JoeCool 2009-06-08 18:38:02

Answer 3

+3 A:

In The Practice of Programming, pg. 70, (The Markov Chain Algorithm) there is a similar algorithm for that:

[...]
  nmatch = 0;
  for ( /* iterate list */ )
    if (rand() % ++nmatch == 0) /* prob = 1/nmatch */
      w = suf->word;
[...]

"Notice the algorithm for selecting one item at random when we don't know how many items there are. The variable nmatch counts the number of matches as the list is scanned. The expression
rand() % ++nmatch == 0
increments nmatch and is then true with probability 1/nmatch."

Nick D 2009-06-08 18:13:56

Answer 4

+10 A:

It works mathematically. Can be proven by induction.

Clearly works for n = 1 element satisfying p.

For n+1 elements, we will choose the element n+1 with probability 1/(n+1), so its probability is OK. But how does that effect the end probability of choosing one of the prior n elements?

Each of the prior n had a chance of being selected with probability 1/n, until we found element n+1. Now, after finding n+1, there is a 1/(n+1) chance that element n+1 is chosen, so there is a n/(n+1) chance that the previously chosen element remains chosen. Which means its final probability of being the chosen after n+1 finds is 1/n * (n/n+1) = 1/n+1 -- which is the probability we want for all n+1 elements for uniform distribution.

If it works for n = 1, and it works for n+1 given n, then it works for all n.

John Pirie 2009-06-08 18:38:18

Induction has saved my butt too many times in computation proofs!

JoeCool 2009-06-08 18:40:58

There's actually a simpler way to prove this.For n elements, we will choose the element n with probability 1/n. But what about the prior n-1 elements?Well, under induction we know that all these n-1 elements have the same probability. So the probability for each _must_ be 1/n, since 1/n is the only number that's 1 when multiplied with n. qed :)

FeepingCreature 2009-12-07 17:28:58

Answer 5

+1 A:

decowboy has a nice proof that this works on TopCoder

Paul Reiners 2009-06-08 19:13:52

ansaurus

tags:

views:

answers:

Choose random array element satisfying certain property

related questions