views:

2308

answers:

7

I have read an article about various shuffle algorithms over at Coding Horror. I have seen that somewhere people have done this to shuffle a list:

var r = new Random();
var shuffled = ordered.OrderBy(x => r.Next());

Is this a good shuffle algorithm? How does it work exactly? Is it an acceptable way of doing this?

+34  A: 

It's not a way of shuffling that I like, mostly on the grounds that it's O(n log n) for no good reason when it's easy to implement an O(n) shuffle. The code in the question "works" by basically giving a random (hopefully unique!) number to each element, then ordering the elements according to that number.

I prefer Durstenfield's variant of the Fisher-Yates shuffle which swaps elements.

Implementing a simple Shuffle extension method would basically consist of calling ToList or ToArray on the input then using an existing implementation of Fisher-Yates. (Pass in the Random as a parameter to make life generally nicer.) There are plenty of implementations around... I've probably got one in an answer somewhere.

The nice thing about such an extension method is that it would then be very clear to the reader what you're actually trying to do.

EDIT: Here's a simple implementation (no error checking!):

public static IEnumerable<T> Shuffle<T>(this IEnumerable<T> source, Random rng)
{
    T[] elements = source.ToArray();
    // Note i > 0 to avoid final pointless iteration
    for (int i = elements.Length-1; i > 0; i--)
    {
        // Swap element "i" with a random earlier element it (or itself)
        int swapIndex = rng.Next(i + 1);
        T tmp = elements[i];
        elements[i] = elements[swapIndex];
        elements[swapIndex] = tmp;
    }
    // Lazily yield (avoiding aliasing issues etc)
    foreach (T element in elements)
    {
        yield return element;
    }
}
Jon Skeet
Well, implementations for small, but important, things like this I would say is always nice to find here on StackOverflow. So yes please, if you want to =)
Svish
Jon -- your explanation of Fisher-Yates is equivalent to the implementation given in the question (the naive version). Durstenfeld/Knuth achieve O(n) not by assignment, but by selection from a decreasing set and swapping. This way the random number selected may repeat and the algorithm only takes O(n).
tvanfosson
Actually it's better than the naive version, but would still be O(n log n).
tvanfosson
@tvanfosson: I've always assumed the "swapping" version, which is O(n).
Jon Skeet
(I hadn't even thought of not doing the swapping version in fact! Will edit accordingly.)
Jon Skeet
aliasing issues?
Svish
@Svish: If you return an array (as `IEnumerable<T>` then store it in two places, then one place could cast back to an array and change what the other place "sees".
Jon Skeet
@Jon -- I think a better way to describe it is that it randomly assigns each element to a position in a new array rather than assigning a random number to each element and ordering by that number. Nice implementation though -- I actually have a use coming up for it -- part of a data anonymization effort.
tvanfosson
@tvanfosson: I'm describing what the code in the question *does*. The overall effect is to shuffle, but *how it works* is to assign a random number to each element and then sort. On the other hand, maybe I've confused things with the order of the paragraphs... I suspect so!
Jon Skeet
@Jon -- thanks for the clarification. I was, in fact, assuming you were describing your algorithm.
tvanfosson
BTW - you can get by with a little less arithmetic if you decrement from the end to the beginning.
tvanfosson
Will edit. Without tests I'm somewhat nervous, but...
Jon Skeet
You're probably getting sick of hearing from me on this, but I ran into a slight problem in my unit tests that you might want to be aware of. There's a quirk with ElementAt that makes it invoke the extension each time, giving unreliable results. In my tests I'm materializing the result before checking to avoid this.
tvanfosson
@tvanfosson: Not sick at all :) But yes, callers should be aware that it's lazily evaluated.
Jon Skeet
A bit late, but please note `source.ToArray();` requires you to have `using System.Linq;` in the same file. If you don't, you get this error: `'System.Collections.Generic.IEnumerable<T>' does not contain a definition for 'ToArray' and no extension method 'ToArray' accepting a first argument of type 'System.Collections.Generic.IEnumerable<T>' could be found (are you missing a using directive or an assembly reference?)`
R. Bemrose
(I'm not using LINQ in this project and had to search for a bit to find out why I was getting this error.)
R. Bemrose
+3  A: 

It's probablly ok for most purposes, and almost always it generates a truly random distribution (except when Random.Next() produces two identical random integers).

It works by assigning each element of the series a random integer, then ordering the sequence by these integers.

It's totally acceptable for 99.9% of the applications (unless you absolutely need to handle the edge case above). Also, skeet's objection to its runtime is valid, so if you're shuffling a long list you might not want to use it.

ripper234
A: 

This algorithm shuffles by generating a new random value for each value in a list, then ordering the list by those random values. Think of it as adding a new column to an in-memory table, then filling it with GUIDs, then sorting by that column. Looks like an efficient way to me (especially with the lambda sugar!)

Dave Swersky
+2  A: 

Seems like a good shuffling algorithm, if you're not too worried on the performance. The only problem I'd point out is that its behavior is not controllable, so you may have a hard time testing it.

One possible option is having a seed to be passed as a parameter to the random number generator (or the random generator as a parameter), so you can have more control and test it more easily.

Samuel Carrijo
good point about the testing ... you need it to be seeded
John Nicholas
+2  A: 

This has come up many times before. Search for Fisher-Yates on StackOverflow.

Here is a C# code sample I wrote for this algorithm. You can parameterize it on some other type, if you prefer.

static public class FisherYates
{
        static Random r = new Random();
        //      Based on Java code from wikipedia:
        //      http://en.wikipedia.org/wiki/Fisher-Yates_shuffle
        static public void Shuffle(int[] deck)
        {
                for (int n = deck.Length - 1; n > 0; --n)
                {
                        int k = r.Next(n+1);
                        int temp = deck[n];
                        deck[n] = deck[k];
                        deck[k] = temp;
                }
        }
}
hughdbrown
+2  A: 

Slightly unrelated, but here is an interesting method (that even though it is really excessibe, has REALLY been implemented) for truly random generation of dice rolls!

Dice-O-Matic

The reason I'm posting this here, is that he makes some interesting points about how his users reacted to the idea of using algorithms to shuffle, over actual dice. Of course, in the real world, such a solution is only for the really extreme ends of the spectrum where randomness has such an big impact and perhaps the impact affects money ;).

Irfy
Very interesting! I've just used that, and added my own bits to it in a playlist "shuffle" feature in my own personal project! Thanks!
lucifer
+14  A: 

This is based on Jon Skeet's answer.

In that answer, the array is shuffled, then returned using yield. The net result is that the array is kept in memory for the duration of foreach, as well as objects necessary for iteration, and yet the cost is all at the beginning - the yield is basically an empty loop.

This algorithm is used a lot in games, where the first three items are picked, and the others will only be needed later if at all. My suggestion is to yield the numbers as soon as they are swapped. This will reduce the start-up cost, while keeping the iteration cost at O(1) (basically 5 operations per iteration). The total cost would remain the same, but the shuffling itself would be quicker. In cases where this is called as collection.Shuffle().ToArray() it will theoretically make no difference, but in the aforementioned use cases it will speed start-up. Also, this would make the algorithm useful for cases where you only need a few unique items. For example, if you need to pull out three cards from a deck of 52, you can call deck.Shuffle().Take(3) and only three swaps will take place (although the entire array would have to be copied first).

public static IEnumerable<T> Shuffle<T>(this IEnumerable<T> source, Random rng)
{
    T[] elements = source.ToArray();
    // Note i > 0 to avoid final pointless iteration
    for (int i = elements.Length - 1; i > 0; i--)
    {
        // Swap element "i" with a random earlier element it (or itself)
        int swapIndex = rng.Next(i + 1);
        yield return elements[swapIndex];
        elements[swapIndex] = elements[i];
        // we don't actually perform the swap, we can forget about the
        // swapped element because we already returned it.
    }

    // there is one item remaining that was not returned - we return it now
    yield return elements[0]; 
}
configurator
Neat - I like it :)
Jon Skeet
Ouch! This will likely not return all the items in the source. You can't rely on a random number being unique for N iterations.
P Daddy
Clever! (And I hate this 15 character stuff...)
Svish
@P Daddy: Huh? Care to elaborate?
Svish
@Svish: An extreme example: `rng.Next(i + 1)` *could* return zero every time, just like a flipped quarter could come up heads 15 times in a row. Although it won't likely actually come up zero N times in a row, *some* number of repeats is very likely, so the chances of complete coverage are rather low.
P Daddy
Wait a minute. Never mind. I wasn't paying enough attention. The half-swap takes care of repeated random numbers. If it came up zero every time it would just effectively reverse all but the first element.
P Daddy
There is still a minor problem, though. This returns only N-1 elements. If you have 10 elements in the source, this returns only 9 of them. Add `yield return elements[0];` after the `for` loop to correct it.
P Daddy
D'oh! I guess I wasn't paying enough attention either...
configurator
+1 for smart thinking, btw
P Daddy
Or you could replace the > 0 with >= 0 and not have to (although, an extra RNG hit plus a redundant assignment)
FryGuy