views:

575

answers:

10

I need to pick a random element from a list, that fulfills certain condition. The approach I've been using works, but I'm sure is not every efficient. What would be the most efficient way to do it?

The following code is inside a while (true) loop, so obviously is not very efficient to shuffle the list on every iteration.

Foo randomPick = null;
Collections.shuffle(myList);
for (Foo f : myList) {
    if (f.property) {
        randomPick = f;
        break;
    }
}

Thanks in advance!

+2  A: 

Instead of shuffling, why not pick a random index integer, and then test its property?


  public Foo picker() {
    while (true) { // TODO: realistically, you need to deal with exit conditions
      int randIdx = myRand.nextInt();
      Foo randomPick = myList.get(randIdx % randIdx);
      if (randomPick.property)
        return randomPick;
    }
  }

Note that this does indeed assume that a non-trivial number of the list do have property true, and that these properties change.

If the two assumptions were to be falsified, you'd want to select a subset, and then randomly select one of those.

CPerkins
even if most of the elements don't fulfill the property? let's say that 80% of them don't fulfill the property
davidrobles
But what if the property tests false? You need to keep checking elements until you find one that is true. If there are only a small number of elements that are true, your solution could take a long time to terminate.
Mark Byers
You need to keep track of what elements you’ve already visited. Otherwise picking a random element instead of iterating all would cause an infinite loop if there are no appropriate items at all.
Gumbo
@David: True, if most don't fulfill, this isn't ideal, but then you face uglier choices. Picking a subset exposes you to property mutability issues, and performance could degrade.@Mark - yes, this can take a long time.@Gumbo - Random generally doesn't loop like that. You could certainly check the same element twice, but it would exit in time, given that there *are* elements which fulfill. If not, you're stuck with the subset problem.
CPerkins
@CPerkins: not just a long time. If no element satisifes the property, it will take an *infinite* amount of time.
Mark Byers
And knowing whether or not there are any elements that satisfy the property requires in the worst case you to enumerate the entire list and check the property for every element. My suggestion (lazy shuffle) avoids both these problems.
Mark Byers
A: 

I would suggest making a temporary filtered list out of myList where every item satisfies your condition. Then at every iteration you can shuffle it and pick the top item, or pick a random item using a random index.

Joy Dutta
+6  A: 

The most efficient solution will partly depend on how often you're going to pick random elements, whether you want to pick different random elements, and what proportion of the elements meet the criterion

A few options:

  • Create a copy containing only the elements meeting the criterion. You can then either shuffle that and iterate over it for successive distinct random elements, or just pick arbitrary random elements by picking a random index. This is obviously O(n) setup in both time and space, but efficient thereafter.
  • Shuffle the collection once, then iterate over as you are doing - but keep the iterator. Alternatively, perform the iteration manually using the index. This will allow you to get distinct random elements. This is O(n) setup again.
  • Keep picking random elements from the original list until you find one which meets the criteria. This has could take a very long time if you have a large list with only a few "valid" items though. This requires no setup, but you could end up "wasting" work by repeatedly testing the same elements.
  • A hybrid approach: keep picking random elements from the original list, but remove them if they don't meet the criterion. Unfortunately removal from the middle of a list is O(n) operation too for the common case of ArrayList :(

(Note that this I've been mostly assuming ArrayList complexity. Getting to a specific index in a LinkedList is O(n) for example, but then removal is cheap.)

Jon Skeet
As usual, a cogent analysis from the pony.
CPerkins
Picking items at random and don’t keeping track of the visited items would cause an infinite loop if there are no appropriate items at all.
Gumbo
Removal from the middle of a list is not a O(n) operation, it's a O(1) operation, assuming you have an index on your list for seeking to an element, or are already pointed to it.
Zak
Sorry, I should have specified ArrayList. It's O(1) for a linked list, but it's much more work to get to a random element for a linked list.
Jon Skeet
Yes, I'm using in fact an ArrayList, so maybe the hybrid approach is not that efficient in my problem.Thanks
davidrobles
You didn't mention lazy shuffle. Is there a reason why you don't like this solution?
Mark Byers
A: 
    List<Foo> matchingItems = new ArrayList<Foo>();
    for(Foo f : myList) {
      if (f.property) matchingItems.add(f);
    }

   Random randomGenerator = new Random();
   int index = randomGenerator.nextInt(matchingItems.size());
   Foo randomFoo = matchingItems.get(index);
zmf
This requires evaluating the property for every element in the list. Evaluating the property could be a slow operation.
Mark Byers
It's clear from the initial statement of the problem that this is not a concern; the submitter even illustrated it as a simple field access.
Kevin Bourrillion
+3  A: 

Use a lazy shuffle: it calculates and yields the shuffled elements only as they are needed.

A C# implementation, but easily converted to Java: http://stackoverflow.com/questions/1287567/c-is-using-random-and-orderby-a-good-shuffle-algorithm/1287572#1287572

Mark Byers
+1  A: 

Since you are doing a shuffle, you are basically touching every element at least once, so you already have a big O of at least N. If you pick a random index, then test the element at that location, then you are going to get the variable you want before you have touched N elements, guaranteed, thus a guaranteed improvement. If you have a 20% distribution of elements then you would expect every 5th random index to give you an element that meets your criteria. While this is not a guarantee, it is a probabliltiy. Absolute worst case scenario, you would choose all 80% of the elements that didn't meet your criteria, then the next one would be your random element. you max execution would be limited to would be would be .8N + 1, still better than N. and on average your big O cost would be something like a constant of 5-10 . WAAAAY better in terms of execution as N increases.

Zak
> If you pick a random index, then test the element at that location, then you are going to get the variable you want before you have touched N elements, guaranteed,You talk a lot about guarantees, but how exactly do you guarantee that you aren't hitting the same element more than once in O(N)?
Mark Byers
and thus the guarantee crumbles.. def my bad. I was envisioning just removing that elem from the index list, then doing a rand on N-1 as an algorith recursively. But I realize to create the index is cost (N) ... The only thing I could think would be to attach a pointer to the next good elem on each elem when it is crossed off, but I think in aggregate that operation would have Log(n) cost itself...
Zak
One other option would be to create a second arrraylist, as elems are crossed off, add them at the same index to the arrayList, along with a pointer to the nextGood Elem. I think this basically turns the thing into a modified lazy shuffle as you were pointing out in your answer.
Zak
+1  A: 

Just choose a random index;

Random r = new Random();
while (true)
{
    ...
    Foo element = null;
    while (element == null)
    {
        Foo f = myList.get(r.nextInt(myList.size());
        if (f.property) element = f;
    }
    ...
}
Martijn Courteaux
A: 

You could use a linear congruential random number generator which cycles once through the indexes of your list, plus maybe a few more to be able to chose suitable parameters, and then check the values indexed by this sequence, discarding indexes which are too big and picking the first element you find. If you find nothing you can stop when the random sequence starts repeating.

For this to work m needs be at least as large as the list. To be able to chose a the modulus m must contain a factor of 8 or a square of some prime >= 3. c should be chosen randomly so that it is relatively prime to m. Also pick the initial value randomly.

starblue
A: 

Randomly shuffling the array more than once is silly.

Shuffle once. Then, for each request of a random value, return the next value in order until you've exhausted your list.

Andrew Raffensperger
A: 

O(n):

Random r = new Random();
int seen = 0;
Foo randomPick = null;
for (Foo foo : myList) {
  if (foo.property && r.nextInt(++seen) == 0) {
    randomPick = foo;
  }
}
Kevin Bourrillion