tags:

views:

304

answers:

7

I think I've settled on this as the most simple and unit-testable method for randomising a list, but would be interested to hear of any improvements.

public static IList<T> RandomiseList<T>(IList<T> list, int seed)
{
    Random random = new Random(seed);
    List<T> takeFrom = new List<T>(list);
    List<T> ret = new List<T>(takeFrom.Count);

    while (takeFrom.Count > 0)
    {
        int pos = random.Next(0, takeFrom.Count - 1);
        T item = takeFrom[pos];
        takeFrom.RemoveAt(pos);
        ret.Add(item);
    }

    return ret;
}
+17  A: 

You want a shuffle, and the best way to do that is the Fisher-Yates shuffle:

public static IList<T> Randomise<T>(IList<T> list, int seed) 
{
    Random rng = new Random(seed); 

    List<T> ret = new List<T>(list);      
    int n = ret.Length;            
    while (n > 1) 
    {
        n--;                         
        int k = rng.Next(n + 1);  
        // Simple swap of variables
        T tmp = list[k];
        ret[k] = ret[n];
        ret[n] = tmp;
    }
    return ret;
}
Joel Coehoorn
Oh that's what i was going for! You beat me to it! :)
Filip Ekberg
Isn't that what he has, though?
PeterAllenWebb
No, it's not what he has. He's moving items from one list to another rather than swapping in place.
Joel Coehoorn
Okay. Sure. But they are both Fisher-Yates, and therefore result in all output-orders having an equal probability.
PeterAllenWebb
So basically, working from the end of the list, you swap each item with another item chosen at random? That makes sense - cheers!
Neil Barnwell
@PeterAllen: I'll give you that. But I think he's better off building one new list rather than two.
Joel Coehoorn
Oh, and having now seen it, I like the iterator blocks solution by Dennis better anyway: http://stackoverflow.com/questions/1627064/how-could-i-improve-this-c-randomising-method/1627132#1627132
Joel Coehoorn
Quick fix needed: your method does not return anything.
Greg
@Greg: ty, fixed.
Joel Coehoorn
Also see @Guffa's answer, which is even better than Dennis'
Joel Coehoorn
A: 

No stats to support this but it would seem better if your return value starts as an array of the same length as the list and then you insert values directly into a randomly generated index.

George Mauer
But then you have to think about collisions, and the obvious approach (choose the first free slot) gives the wrong distribution
Simon Nickerson
I think I've sort've done this by specifying the initial capacity of the return list, based on JS Bangs' suggestion.
Neil Barnwell
I actually meant to actually place directly to an index rather than use add but clearly Joel Cahoon's suggestion is best - it has an algorithm name
George Mauer
+2  A: 

This looks good to me. Note that you'll get slightly better performance (especially for large lists) if you initialize ret with the length of list, so that the list doesn't have to be reallocated:

List<T> ret = new List<T>(list.Count);
JSBangs
+1 Done, thanks for the pointer!
Neil Barnwell
That's won't do much. The most of the time will clearly be spent in the RemoveAt method, as that has to move all the items following the removed one each time. As long as that one is still there, it's rather pointless to try to save anything elsewhere.
Guffa
+3  A: 

Not sure how much of an improvement this is, but would have performance benefits if the list is large and you only need the first few random items.

public static IEnumerable<T> RandomiseList<T>(IList<T> list, int seed)
{
    Random random = new Random(seed);
    List<T> takeFrom = new List<T>(list);

    while (takeFrom.Count > 0)
    {
        int pos = random.Next(0, takeFrom.Count - 1);
        T item = takeFrom[pos];
        takeFrom.RemoveAt(pos);
        yield return item;
    }
}

Removes the need for a temp list or even a temp swap variable.

If I was going to be using this a lot, I'd rewrite it as an extension method.

Dennis Palmer
<3 me some iterator blocks
Joel Coehoorn
Also, it's only 9 extra characters after calling this to get your list back if you really need it.
Joel Coehoorn
It still has the RemoveAt method, which will make it slow...
Guffa
@Guffa: what you recommend instead? A HashSet to track already rolled indexes?
Joel Coehoorn
@Joel: A List works fine. You just swap the items so that the unused ones are at the end. Actually you don't have to swap them as the returned item will never be used any more, you only have to move one item and return the other. See the implementation that I posted.
Guffa
+14  A: 

I liked Dennis Palmers idea of returning a shuffled IEnumerable instead of shuffle the list in place, but using the RemoveAt method makes it slow. Here is an alternative without the RemoveAt method:

public static IEnumerable<T> Shuffle<T>(IEnumerable<T> list, int seed) {
  Random rnd = new Random(seed);
  List<T> items = new List<T>(list);
  for (int i = 0; i < items.Count; i++) {
    int pos = rnd.Next(i, items.Count);
    yield return items[pos];
    items[pos] = items[i];
  }
}

I thried this with 10000 integers, and it's about 30 times faster.

Guffa
Very nice. Could maybe be improved by counting backwards (simpler rnd.Next call) +1
Joel Coehoorn
Very tempted to delete my own in favor of this. Need to recruit a few more votes for you first so it ends up listed 2nd behind me.
Joel Coehoorn
@Joel: I don't think that you should delete your answer, it shows the fundamental principle of an efficient shuffle in a clear way.
Guffa
Both answers are nice, but this one really rocks. +1 !
Cloud
+2  A: 

What sort of suggestions are you looking for exactly? efficiency? correctness? You do mention unit testing ... I think there could definitely be an improvement there.

I actually helped develop an online game and their shuffling mechanism. I don't really suspect performance is much of an issue, as most algorithms you find are by and large the same. I would suggest the following however,

a. create a random interface

public interface IRandom
{
    byte NextRandomByte ();
}

Anything that now consumes this interface can now be mocked\unit tested in a controlled manner or environment. You do not really want to be unit testing truly random algorithms - you won't be able to verify your data!

As for why return a byte, a byte is likely the smallest unit of randomness you could want. Not only that, but if given a means of generating a single random byte, generating a sequence of them and concatenating them together is an easy way of generating an even wider range of random data.

Of course, you will have to be wary of introduing bias to your data ...

b. Ensure quality of data by reducing bias over arbitrary intervals. Assuming underlying data is uniformly random, any interval that is NOT a factor of 256 will introduce bias. Consider this,

// 250 is not a factor of 256!
byte a = random.NextRandomByte () % 250; // values 0-5 are biased!

In the preceeding snippet, values 0-5 have a 2/255 probability to come up, while values 6-249 have a 1/255 probability to come up. That is a significant bias over time. One approach is to check the number coming from the generator, and discard it if it exceeds an acceptable range

// continually generate random data until it is satisfactory
for (byte r = random.NextRandomByte (); r > 250; r = random.NextRandomByte ())
{
}
byte a = r % 250; // r is guaranteed to be on [0, 250], no longer bias

"Acceptable range" may be determined by finding the greatest multiple of your interval that can be represented by your value type. A more generalized form

byte modulo; // specified as parameter
byte biasThreshold = (byte.MaxValue / modulo) * modulo;
for (; unbiasedValue >= biasThreshold; )
{
    // generate value
    unbiasedValue = random.NextRandomByte ();
}

And if you want values greater than byte, simply concatenate the values together,

int modulo; // specified as parameter
int biasThreshold = (int.MaxValue / modulo) * modulo;
for (; unbiasedValue >= biasThreshold; )
{
    // generate value
    byte a = random.NextRandomByte ();
    byte b = random.NextRandomByte ();
    ... 
    int unbiasedValue = a << 24 + b << 16 + c << 8 + d;
}

c. Consume! Place your algorithms or helpers in stateless extension or static classes, like

// forgive my syntax, recalling from memory
public static class IRandomExtensions
{
    public int GetUnbiasedInteger (this IRandom random, int modulo) { }
    public int GetUnbiasedUnsignedInteger (this IRandom random, uint modulo) { }
    public int GetUnbiasedLong (this IRandom random, long modulo) { }
    public int GetUnbiasedUnsignedLong (this IRandom random, ulong modulo) { }
    ...
}

public static class IEnumerableExtensions
{
    public IEnumerable<T> Shuffle<T>(this IEnumerable<T> items, IRandom random) 
    {
        // shuffle away!
        ...
    }

}

Deciding whether or not to implement these as methods on your interface or as external methods [as i've done] is up to you - but keep in mind, making them member methods forces implementors to repeat or duplicate code. Personally, I like extensions. They are very clean. And sexy.

int randomNumber = random.UnbiasedInteger (i - 1);
List<int> shuffledNumbers = numbers.Shuffle (random);


Clearly all of the preceeding is optional, but facilitates unit testing and improves overall quality of your random data.

Random and "fair" dice is a very interesting topic in general. If you are at all interested, I strongly recommend you Google it sometime and perform some research. :)

johnny g
Actually, the built-in prng is already unit-testable in isolation. Just keep passing the same seed, and you'll keep getting the same values back. Of course, the key there is isolation. Eventually you want to test code that relies on it and uses a real seed. But even here, it might be simpler to abstract the seed-selector so you get consistent seeds for your tests.
Joel Coehoorn
Hm, some of you may be curious to know how useful any of this is. For one, as I've said, you can unit test predictably. For another, while in SIT, I had several implementation of IRandom, you can probably guess their underlying generators by name alone - ByteQueueRandom, GuidRandom, CryptographicRandom. I could go from predictable, to pseudo, to random with a config change. Oh, and syntactically, myList.Shuffle(random) looks pretty sweet too :)
johnny g
@Joel Coehoorn you're absolutely right, i kinda overlooked that. but if given a choice between controlling the generator or seeding someone else's implementation - however *unlikely* it is to change wrt to my inputs - i would rather opt for control.
johnny g
A: 

Be aware of the risks of naive shuffling algorithms, that look good, but do not stand up to testing!

Check this excellent article for an example.

abelenky