views:

6883

answers:

14

I was just wondering what is the best way to randomize an array of strings in C#. My array contains about 500 strings and I'd like to create a new Array with the same strings but in a random order.

Any suggestions?

+1  A: 

Generate an array of random floats or ints of the same length. Sort that array, and do corresponding swaps on your target array.

This yields a truly independent sort.

Nick
Seems rather wasteful.
Matt Howells
A: 

Randomizing the array is intensive as you have to shift around a bunch of strings. Why not just randomly read from the array? In the worst case you could even create a wrapper class with a getNextString(). If you really do need to create a random array then you could do something like

for i = 0 -> i= array.length * 5
   swap two strings in random places

The *5 is arbitrary.

stimms
A random read from the array is likely to hit some items multiple times and miss others!
Ray Hayes
The shuffle algorithm is broken. You would have to make your arbitrary 5 very high indeed before your shuffle is unbiased.
dysfunctor
+11  A: 

See Jeff Atwood's own posts on shuffling and the one where he found an error in his own naive shuffling code

This must gain me some reputation ;)

gnobal
Indeed. The Knuth shuffle is a quick and easy way to reorder the array, and it's guaranteed to be unbiased.
Nick Johnson
+4  A: 
List<string> stringlist = new List<string>();

// add your values to stringlist

Random r = new Random();

List<string> res = new List<string>();

while (stringlist.Length >0)
{
   int i = r.GetNext(stringlist.Length);
   res.Add(stringlist[i]);
   stringlist.RemoveAt(i);
}
Sklivvz
This has the same effect as a knuth shuffle, but it's not as efficient, since it involves depopulating one list and repopulating another. Swapping items in place would be a better solution.
Nick Johnson
I find this elegant and easily understandable and on 500 strings it doesn't make a bit of a difference...
Sklivvz
+20  A: 

If you're on .NET 3.5, you can use the following IEnumerable coolness (VB.NET, not C#, but the idea should be clear...):

Dim rnd As New System.Random
Dim MyRandomArray = MyArray.OrderBy(Function() rnd.Next)

Edit: OK and here's the corresponding C# code:

Random rnd=new Random();
string[] MyRandomArray = MyArray.OrderBy(x => rnd.Next()).ToArray();

Second edit, in response to remarks that System.Random "isn't threadsafe" and "only suitable for toy apps" due to returning a time-based sequence: as used in my example, Random() is perfectly thread-safe, unless you're allowing the routine in which you randomize the array to be re-entered, in which case you'll need something like lock (MyRandomArray) anyway in order not to corrupt your data, which will protect rnd as well.

Also, it should be well-understood that System.Random as a source of entropy isn't very strong. As noted in the MSDN documentation, you should use something derived from System.Security.Cryptography.RandomNumberGenerator if you're doing anything security-related. For example:

using System.Security.Cryptography;

...

RNGCryptoServiceProvider rnd = new RNGCryptoServiceProvider();
string[] MyRandomArray = MyArray.OrderBy(x => GetNextInt32(rnd)).ToArray();

...

static int GetNextInt32(RNGCryptoServiceProvider rnd)
    {
        byte[] randomInt = new byte[4];
        rnd.GetBytes(randomInt);
        return Convert.ToInt32(randomInt[0]);
    }
mdb
two notes: 1) System.Random is not thread-safe (you've been warned) and 2) System.Random is time based, so if you use this code in a heavily concurrent system, it's possible for two requests to get the same value (i.e. in webapps)
therealhoff
Just to clarify the above, System.Random will seed itself using the current time, so two instances created simultaneously will generate the same "random" sequence..System.Random should only be used in toy apps
therealhoff
Also this algorithm is O(n log n) and biased by the Qsort algorithm. See my answer for an O(n) unbiased solution.
Matt Howells
-1: Not the "best" algorithm for the same reason as mentioned by Matt above.
Juliet
Unless `OrderBy` caches the sort keys internally, this also has the problem of violating the transitive property of ordered comparisons. If there is ever a debug mode verification that `OrderBy` produced correct results, then in theory it could throw an exception.
280Z28
Edit: `OrderBy` does in fact cache the keys. However, I don't see this as part of the public specification so there's no guarantee that all implementations will do so.
280Z28
A: 

Check out Jeff Atwood's post on shuffling cards.

JKueck
A: 

Just thinking off the top of my head, you could do this:

public string[] Randomize(string[] input)
{
  List<string> inputList = input.ToList();
  string[] output = new string[input.Length];
  Random randomizer = new Random();
  int i = 0;

  while (inputList.Count > 0)
  {
    int index = r.Next(inputList.Count);
    output[i++] = inputList[index];
    inputList.RemoveAt(index);
  }

  return (output);
}
Tarsier
+1  A: 
Random r = new Random();
List<string> list = new List(originalArray);
List<string> randomStrings = new List();

while(list.Count > 0)
{
int i = r.Random(list.Count);
randomStrings.Add(list[i]);
list.RemoveAt(i);
}
nullDev
+2  A: 

You're looking for a shuffling algorithm, right?

Okay, there are two ways to do this: the clever-but-people-always-seem-to-misunderstand-it-and-get-it-wrong-so-maybe-its-not-that-clever-after-all way, and the dumb-as-rocks-but-who-cares-because-it-works way.

Dumb way

  • Create a duplicate of your first array, but tag each string should with a random number.
  • Sort the duplicate array with respect to the random number.

This algorithm works well, but make sure that your random number generator is unlikely to tag two strings with the same number. Because of the so-called Birthday Paradox, this happens more often than you might expect. Its time complexity is O(n log n).

Clever way

I'll describe this as a recursive algorithm:

To shuffle an array of size n (indices in the range [0..n-1]):

if n = 0
  • do nothing
if n > 0
  • (recursive step) shuffle the first n-1 elements of the array
  • choose a random index, x, in the range [0..n-1]
  • swap the element at index n-1 with the element at index x

The iterative equivalent is to walk an iterator through the array, swapping with random elements as you go along, but notice that you cannot swap with an element after the one that the iterator points to. This is a very common mistake, and leads to a biased shuffle.

Time complexity is O(n).

dysfunctor
A: 

Here's a simple way using OLINQ:

// Input array
List<String> lst = new List<string>();
for (int i = 0; i < 500; i += 1) lst.Add(i.ToString());

// Output array
List<String> lstRandom = new List<string>();

// Randomize
Random rnd = new Random();
lstRandom.AddRange(from s in lst orderby rnd.Next(100) select s);
Seth Morris
+15  A: 

The 'sort by a random number' technique should be fine for most practical purposes.

var shuffled = myArray.OrderBy(x => Guid.NewGuid());

This is O(n log n) in complexity, and requires the creation of a Guid for each element in the array.

The following implementation uses the Fisher-Yates algorithm. It runs in O(n) time and shuffles in place, so is better performing than the 'sort by random', although it is more lines of code. See here for some comparative performance measurements. For practical purposes I have used System.Random, which is fine for short arrays and for non-cryptographic purposes. For longer arrays, in order to make the (extremely large) number of permutations equally probable it would be necessary to run a pseudo-random number generator through many iterations for each swap to produce enough entropy. For a 500-element array only a very small fraction of the possible 500! permutations will be possible using a PRNG.

class Shuffler
{
    private Random rng = new Random();

    public void Shuffle<T> (T[] array) 
    {
        int n = array.length;
        while (n > 1) 
        {
            int k = rng.Next(n);
            n--;
            T temp = array[n];
            array[n] = array[k];
            array[k] = temp;
        }
    }
}

Usage:

var shuffler = new Shuffler();
var myArray = new int[] {1, 2, 3, 4};
shuffler.Shuffle(myArray);
Matt Howells
A: 

Jacco, your solution ising a custom IComparer isn't safe. The Sort routines require the comparer to conform to several requirements in order to function properly. First among them is consistency. If the comparer is called on the same pair of objects, it must always return the same result. (the comparison must also be transitive).

Failure to meet these requirements can cause any number of problems in the sorting routine including the possibility of an infinite loop.

Regarding the solutions that associate a random numeric value with each entry and then sort by that value, these are lead to an inherent bias in the output because any time two entries are assigned the same numeric value, the randomness of the output will be compromised. (In a "stable" sort routine, whichever is first in the input will be first in the output. Array.Sort doesn't happen to be stable, but there is still a bias based on the partitioning done by the Quicksort algorithm).

You need to do some thinking about what level of randomness you require. If you are running a poker site where you need cryptographic levels of randomness to protect against a determined attacker you have very different requirements from someone who just wants to randomize a song playlist.

For song-list shuffling, there's no problem using a seeded PRNG (like System.Random). For a poker site, it's not even an option and you need to think about the problem a lot harder than anyone is going to do for you on stackoverflow. (using a cryptographic RNG is only the beginning, you need to ensure that your algorithm doesn't introduce a bias, that you have sufficient sources of entropy, and that you don't expose any internal state that would compromise subsequent randomness).

A: 

This post has already been pretty well answered - use a Durstenfeld implementation of the Fisher-Yates shuffle for a fast and unbiased result. There have even been some implementations posted, though I note some are actually incorrect.

I wrote a couple of posts a while back about implementing full and partial shuffles using this technique, and (this second link is where I'm hoping to add value) also a follow-up post about how to check whether your implementation is unbiased, which can be used to check any shuffle algorithm. You can see at the end of the second post the effect of a simple mistake in the random number selection can make.

Greg Beech
A: 

You can also make an extention method out of Matt Howells. Example.

   namespace System
    {
        public static class MSSystemExtenstions
        {
            private static Random rng = new Random();
            public static void Shuffle<T>(this T[] array)
            {
                rng = new Random();
                int n = array.Length;
                while (n > 1)
                {
                    int k = rng.Next(n);
                    n--;
                    T temp = array[n];
                    array[n] = array[k];
                    array[k] = temp;
                }
            }
        }
    }

Then you can just use it like:

        string[] names = new string[] {
                "Aaron Moline1", 
                "Aaron Moline2", 
                "Aaron Moline3", 
                "Aaron Moline4", 
                "Aaron Moline5", 
                "Aaron Moline6", 
                "Aaron Moline7", 
                "Aaron Moline8", 
                "Aaron Moline9", 
            };
        names.Shuffle<string>();
Aaron