views:

9885

answers:

19

How do I pick a random element from a set? I'm particularly interested in picking a random element from a HashSet or a LinkedHashSet, in Java. Solutions for other languages are also welcome.

A: 

In Python:

s = set(1, 2, 3)
print random.choice(s)
Alexander Kojevnikov
This is actually false, as it produces following error: TypeError: 'set' object is unindexable.Try random.choice(list(s)) instead...
Daren Thomas
A: 

Since you said "Solutions for other languages are also welcome", here's the version for Python:

>>> import random
>>> random.choice([1,2,3,4,5,6])
3
>>> random.choice([1,2,3,4,5,6])
4
Swaroop C H
Only, [1,2,3,4,5,6] is not a set, but a list, since it doesn't support things like fast lookups.
Thomas Ahle
You can still do: >>> random.choice(list(set(range(5)))) >>> 4 Not ideal but it'll do if you absolutely need to.
SapphireSun
+3  A: 

In Java:

Set<Integer> set = new LinkedHashSet<Integer>(3);
set.add(1);
set.add(2);
set.add(3);

Random rand = new Random(System.currentTimeMillis());
for (int i = 0; i < 10; ++i) {
    System.out.println(set.toArray()[rand.nextInt(set.size())]);
}
smink
Your answer works, but it's not very efficient because of the set.toArray( ) part.
Clue Less
you should move the toArray to outside the loop.
David Nehme
+8  A: 
int size = myHashSet.size();
int item = new Random().nextInt(size); // In real life, the Random object should be rather more shared than this
int i = 0;
for(Object obj : myhashSet)
{
    if (i == item)
        return obj;
    i = i + 1;
}
Khoth
If myHashSet is large, then this will be a rather slow solution since on average, (n / 2) iterations will be needed to find the random object.
daniel
if your data is in a hash set, you need O(n) time. There's no way around it if you are just picking a single element and the data is stored in a HashSet.
David Nehme
@David Nehme: This is a drawback in the specification of HashSet in Java. In C++, it's typical to be able to directly access the buckets that make up the hashset, which allows us to more efficiently select a random elements.If random elements are necessary in Java, it might be worthwhile to define a custom hash set that allows the user to look under the hood. See [boost's docs][1] for a little more in this. [1] http://www.boost.org/doc/libs/1_43_0/doc/html/unordered/buckets.html
Aaron McDaid
If the set is not mutated over multiple accesses, you can copy it into an array and then access O(1). Just use myHashSet.toArray()
ykaganovich
+2  A: 

Can't you just get the size/length of the set/array, generate a random number between 0 and the size/length, then call the element whose index matches that number? HashSet has a .size() method, I'm pretty sure.

In psuedocode -

function randFromSet(target){
 var targetLength:uint = target.length()
 var randomIndex:uint = random(0,targetLength);
 return target[randomIndex];
}
matt lohkamp
This only works if the container in question supports random index lookup. Many container implementations don't (e.g., hash tables, binary trees, linked lists).
dchaley
A: 

PHP, assuming "set" is an array:

$foo = array("alpha", "bravo", "charlie");
$index = array_rand($foo);
$val = $foo[$index];

The Mersenne Twister functions are better but there's no MT equivalent of array_rand in PHP.

dirtside
A: 

PHP, using MT:

$items_array = array("alpha", "bravo", "charlie");
$last_pos = count($items_array) - 1;
$random_pos = mt_rand(0, $last_pos);
$random_item = $items_array[$random_pos];
da5id
+1  A: 

Javascript solution ;)

function choose (set) {
 return set[Math.floor(Math.random() * set.length)];
}

var set  = [1, 2, 3, 4], rand = choose (set);

Or alternatively:

Array.prototype.choose = function () {
 return this[Math.floor(Math.random() * this.length)];
};

[1, 2, 3, 4].choose();
Mathew Byrne
I prefer the second alternative. :-)
marcospereira
ooh, I like extending adding the new array method!
matt lohkamp
+7  A: 

A somewhat related Did You Know:

There are useful methods in java.util.Collections for shuffling whole collections:

/**
 * Randomly permutes the specified list using a default source of
 * randomness.  All permutations occur with approximately equal
 * likelihood.<p>
 *
 * The hedge "approximately" is used in the foregoing description because
 * default source of randomness is only approximately an unbiased source
 * of independently chosen bits. If it were a perfect source of randomly
 * chosen bits, then the algorithm would choose permutations with perfect
 * uniformity.<p>
 *
 * This implementation traverses the list backwards, from the last element
 * up to the second, repeatedly swapping a randomly selected element into
 * the "current position".  Elements are randomly selected from the
 * portion of the list that runs from the first element to the current
 * position, inclusive.<p>
 *
 * This method runs in linear time.  If the specified list does not
 * implement the {@link RandomAccess} interface and is large, this
 * implementation dumps the specified list into an array before shuffling
 * it, and dumps the shuffled array back into the list.  This avoids the
 * quadratic behavior that would result from shuffling a "sequential
 * access" list in place.
 *
 * @param  list the list to be shuffled.
 * @throws UnsupportedOperationException if the specified list or
 *         its list-iterator does not support the <tt>set</tt> operation.
 */
public static void shuffle(List<?> list) {
   ...
}

and also

public static void shuffle(List<?> list, Random rnd) {
  ...
}
Ash Kim
very cool. didn't know this existed.
Owen
+1  A: 

Perl 5

@hash_keys = (keys %hash);
$rand = int(rand(@hash_keys));
print $hash{$hash_keys[$rand]};

Here is one way to do it.

J.J.
A: 

Icon has a set type and a random-element operator, unary "?", so the expression

? set( [1, 2, 3, 4, 5] )

will produce a random number between 1 and 5.

The random seed is initialized to 0 when a program is run, so to produce different results on each run use randomize()

Hugh Allen
+2  A: 

In C#

        Random random = new Random((int)DateTime.Now.Ticks);

        OrderedDictionary od = new OrderedDictionary();

        od.Add("abc", 1);
        od.Add("def", 2);
        od.Add("ghi", 3);
        od.Add("jkl", 4);


        int randomIndex = random.Next(od.Count);

        Console.WriteLine(od[randomIndex]);

        // Can access via index or key value:
        Console.WriteLine(od[1]);
        Console.WriteLine(od["def"]);
Mitch Wheat
+1  A: 

In lisp

(defun pick-random (set)
       (nth (random (length set)) set))
inglesp
+2  A: 

If you want to do it in Java, you should consider copying the elements into some kind of random-access collection (such as an ArrayList). Because, unless your set is small, accessing the selected element will be expensive (O(n) instead of O(1)).

Alternatively, you could look for another Set implementation that more closely matches your requirements. The ListOrderedSet from Commons Collections looks promising.

Dan Dyer
Copying to a list will cost O(n) in time and also use O(n) memory, so why would that be a better choice than fetching from the map directly?
mdma
A: 

Clojure solution:

(defn pick-random [set] (let [sq (seq set)] (nth sq (rand-int (count sq)))))
pjb3
+1  A: 

Here's how you'd do it in java if you wanted a specific weights applied to the set/array.


public static Object getRandom(Set s,double[] weight){
     return this.getRandom(s.toArray(),weight);
}
public static Object getRandom(Object[] obj, double[] weight){
     if (ary.length != weight.length) {
     throw new IllegalArgumentException(
       "Array and array weights must be of equal length.");
     }
    double totWeight = sumWeights(weight);
    if (totWeight == 0.0) {
        return null;
    }
    double rnd = Math.random() * totWeight;
        for (int i = 0; i 
A: 

If the hashmap constantly grows in your code, then you can simply select the first element of its HashSet. That is

Object randElement = yourMap.get(yourMap.keySet().iterator().next());
+1  A: 

Unfortunately, this cannot be done efficiently (better than O(n)) in any standard set containers I know of.

This is odd, since it is very easy to add a randomized pick function to hash sets as well as binary sets. In a not to sparse hash set, you can try random entries, until you get a hit. For a binary tree, you can choose randomly between the left or right subtree, with a maximum of O(log2) steps. I've implemented a demo of the later below:

import random

class Node:
    def __init__(self, object):
        self.object = object
        self.value = hash(object)
        self.size = 1
        self.a = self.b = None

class RandomSet:
    def __init__(self):
        self.top = None

    def add(self, object):
        """ Add any hashable object to the set.
            Notice: In this simple implementation you shouldn't add two
                    identical items. """
        new = Node(object)
        if not self.top: self.top = new
        else: self._recursiveAdd(self.top, new)
    def _recursiveAdd(self, top, new):
        top.size += 1
        if new.value < top.value:
            if not top.a: top.a = new
            else: self._recursiveAdd(top.a, new)
        else:
            if not top.b: top.b = new
            else: self._recursiveAdd(top.b, new)

    def pickRandom(self):
        """ Pick a random item in O(log2) time.
            Does a maximum of O(log2) calls to random as well. """
        return self._recursivePickRandom(self.top)
    def _recursivePickRandom(self, top):
        r = random.randrange(top.size)
        if r == 0: return top.object
        elif top.a and r <= top.a.size: return self._recursivePickRandom(top.a)
        return self._recursivePickRandom(top.b)

if __name__ == '__main__':
    s = RandomSet()
    for i in [5,3,7,1,4,6,9,2,8,0]:
        s.add(i)

    dists = [0]*10
    for i in xrange(10000):
        dists[s.pickRandom()] += 1
    print dists

I got [995, 975, 971, 995, 1057, 1004, 966, 1052, 984, 1001] as output, so the distribution seams good.

I've struggled with the same problem for myself, and I haven't yet decided weather the performance gain of this more efficient pick is worth the overhead of using a python based collection. I could of course refine it and translate it to C, but that is too much work for me today :)

Thomas Ahle
A: 

C++. This should be reasonably quick, as it doesn't require iterating over the whole set, or sorting it. This should work out of the box with most modern compilers, assuming they support tr1. If not, you may need to use Boost.

The Boost docs are helpful here to explain this, even if you don't use Boost.

The trick is to make use of the fact that the data has been divided into buckets, and to quickly identify a randomly chosen bucket (with the appropriate probability).

//#include <boost/unordered_set.hpp>  
//using namespace boost;
#include <tr1/unordered_set>
using namespace std::tr1;
#include <iostream>
#include <stdlib.h>
#include <assert.h>
using namespace std;

int main() {
  unordered_set<int> u;
  u.max_load_factor(40);
  for (int i=0; i<40; i++) {
    u.insert(i);
    cout << ' ' << i;
  }
  cout << endl;
  cout << "Number of buckets: " << u.bucket_count() << endl;

  for(size_t b=0; b<u.bucket_count(); b++)
    cout << "Bucket " << b << " has " << u.bucket_size(b) << " elements. " << endl;

  for(size_t i=0; i<20; i++) {
    size_t x = rand() % u.size();
    cout << "we'll quickly get the " << x << "th item in the unordered set. ";
    size_t b;
    for(b=0; b<u.bucket_count(); b++) {
      if(x < u.bucket_size(b)) {
        break;
      } else
        x -= u.bucket_size(b);
    }
    cout << "it'll be in the " << b << "th bucket at offset " << x << ". ";
    unordered_set<int>::const_local_iterator l = u.begin(b);
    while(x>0) {
      l++;
      assert(l!=u.end(b));
      x--;
    }
    cout << "random item is " << *l << ". ";
    cout << endl;
  }
}
Aaron McDaid