tags:

views:

478

answers:

6
+4  Q: 

Shuffle List<T>

Hi All

I have a list which contains many thousands of FilePath's to locations of audio files, and was wondering which would be the most efficient way to "shuffle" a List?

Any help is greatly appreciated :)

Thank you

+7  A: 

Fischer-Yates Shuffle or as it is also known as, Knuth shuffle.

Moron
...which is O(n), so you can't get any better than that.
Guffa
To the person who downvoted: Please leave a note, giving a chance to better the answer.
Moron
... I just upvoted you because I liked your answer :) Dunno why it was down-voted?
lucifer
btw, for faster shuffling, I would suggest you shuffle a list/array of integers (using whatever method you choose), and use that shuffled list/array as an index into the list of filenames. Swapping filenames could turn out be a bottleneck.
Moron
+2  A: 

Are you using Linq perchance? If so, you could try this extension method: Let's Do Some Shuffling

Gunny
Veyr interesting, works very well too. Thanks Gunny :)
lucifer
No prob - happy hunting! The link posted by thedugas has some good info in it as well.
Gunny
-1. Sorting is not the most efficient way of shuffling.
Guffa
-1: Sorting is the easiest--not the most efficient--way of shuffling.
Fake Code Monkey Rashid
+1  A: 

myList.OrderBy(Guid.NewGuid())

Esteban Araya
+6  A: 

See this post: http://stackoverflow.com/questions/273313/randomize-a-listt-in-c

thedugas
+4  A: 

Here is a simple (yet effective) implementation of the Fischer-Yates/Knuth shuffle:

Random rnd = new Random();
for (int i = files.Length; i > 1; i--) {
  int pos = rnd.Next(i);
  var x = files[i - 1];
  files[i - 1] = files[pos];
  files[pos] = x;
}

Or a slight variation:

Random rnd = new Random();
for (int i = 1; i < files.Length; i++) {
  int pos = rnd.Next(i + 1);
  var x = files[i];
  files[i] = files[pos];
  files[pos] = x;
}

As this is an O(n) operation, it's the most efficient way of shuffling a list. As all items in the list has to have chance to be moved, it's not possible to shuffle a list more efficiently than O(n).

I made a small performance test by shuffling a million items a thousand times each using this method and the currently accepted answer (LINQ OrderBy), and this is about 15 times (!) faster.

Guffa
You are confusing the asymptotic bound with efficiency. Efficiency is defined as resources consumed per unit of work done. Asymptotic bound describes how resources consumed increase as the problem size increases. The "find a substring of length m in a string of length n" algorithm in the System.String class is O(nm) but it is *far* more *efficient* on *typical* problems than the O(n+m) algorithms that we could have implemented. To work out efficiency you have to consider the *likely cases*, not the asymptotic bounds.
Eric Lippert
I also note that your implementation of Fischer-Yates has a bias; it does not produce all possible shuffles with equal likelihood. This is probably not a problem for a music shuffling algorithm, but it is a problem if you were using this to shuffle a deck of cards for a game of poker; given my hand, I could quickly determine what everyone else had in their hand.
Eric Lippert
@Eric: Why do you think that the implementation has a bias? It gives each item the same chance to end up in each postion in the list. Also I have verified the implementation by doing millions of shuffles, and there is no noticable bias.
Guffa
Let x be the number of possible random orders produced by Random. Let y be the number of possible orderings of the shuffle. I think you'll find that if you work out what x and y are, x is way, way, way smaller than y in your implementation, and therefore the shuffle is biased.
Eric Lippert
@Eric: For a list of n items x = n! and y = n!, so I don't see how x can be way, way, way smaller than y.
Guffa
@Guffa: Eric is right, the implementation you have is not Fisher Yates, even though it looks like it. You need to start from the other end.
Moron
@Moron: You are right that it's actually not the Fisher Yates implementation, but it works just fine. In what sense do you mean that Eric is right? (I added the Fisher Yates algorithm also.)
Guffa
The bias doesn't come from the algorithm. The bias comes from the source of "randomness". x is actually 2^32: the number of possible seeds. And since some of those seeds are way more likely than others -- since the seed is based on the clock, and when people choose to run software is not evenly distributed over time -- that's even more bias.
Eric Lippert
@Eric: Oh, I see what you mean, the Random class has an upper limit of 2^32 different random sequences. Well, at least it's less biased than the sorting method, as that has the same limit in the Random generator and also has a bias for duplicate random values...
Guffa
@Guffa: Sorry I misread Eric's comment. All I meant to say was this is not Fischer Yates. As for the bias issue: as you had implemented it, we can _prove_ that if the random generator is uniform, then the shuffle you produce is uniform (or 'unbiased'). Sorry for causing confusion.
Moron
+2  A: 

I added Jon Skeet's solution from this question to my Extensions library. I implemented methods that both take an external random number generator and create one using a default implementation (Random).

tvanfosson
Thank you for that. Very nice :D
lucifer