ansaurus

Question

Is it correct to use JavaScript Array.sort() method for shuffling?

Answer 1

+22 A:

It's never been my favourite way of shuffling, partly because it is implementation-specific as you say. In particular, I seem to remember that the standard library sorting from either Java or .NET (not sure which) can often detect if you end up with an inconsistent comparison between some elements (e.g. you first claim A < B and B < C, but then C < A).

It also ends up as a more complex (in terms of execution time) shuffle than you really need.

I prefer the shuffle algorithm which effectively partitions the collection into "shuffled" (at the start of the collection, initially empty) and "unshuffled" (the rest of the collection). At each step of the algorithm, pick a random unshuffled element (which could be the first one) and swap it with the first unshuffled element - then treat it as shuffled (i.e. mentally move the partition to include it).

This is O(n) and only requires n-1 calls to the random number generator, which is nice. It also produces a genuine shuffle - any element has a 1/n chance of ending up in each space, regardless of its original position (assuming a reasonable RNG). The sorted version approximates to an even distribution (assuming that the random number generator doesn't pick the same value twice, which is highly unlikely if it's returning random doubles) but I find it easier to reason about the shuffle version :)

This approach is called a Fisher-Yates shuffle.

I would regard it as best practice to code up this shuffle once and reuse it everywhere you need to shuffle items. Then you don't need to worry about sort implementations in terms of reliability or complexity. It's only a few lines of code (which I won't attempt in JavaScript!)

The Wikipedia article on shuffling (and in particular the shuffle algorithms section) talks about sorting a random projection - it's worth reading the section on poor implementations of shuffling in general, so you know what to avoid.

Jon Skeet 2009-06-07 21:08:19

Fisher–Yates shuffle?

sixlettervariables 2009-06-07 21:10:00

Yup - was just editing after looking it up on Wikipedia :)

Jon Skeet 2009-06-07 21:11:05

Raymond Chen goes in depth on the importance that sort comparison functions follow the rules: http://blogs.msdn.com/oldnewthing/archive/2009/05/08/9595334.aspx

binarycoder 2009-06-07 22:45:54

if my reasoning is correct, the sorted version *does not* produce a 'genuine' shuffle!

Christoph 2009-06-07 23:36:54

Will edit to give "approximately".

Jon Skeet 2009-06-08 05:25:13

@Christoph: Thinking about it, even Fisher-Yates will only give a "perfect" distribution if rand(x) is guaranteed to be *exactly* even over its range. Given that there are usually 2^x possible states for the RNG for some x, I don't think it'll be *exactly* even for rand(3).

Jon Skeet 2009-06-08 05:28:05

@Jon: but Fisher-Yates will create `2^x` states for each array index, ie there'll be 2^(xn) states total, which should be quite a bit larger that 2^c - see my edited answer for details

Christoph 2009-06-08 07:22:38

@Christoph: I may not have explained myself properly. Suppose you have just 3 elements. You pick the first element randomly, out of all 3. To get a *completely uniform* distribution, you'd have to be able to choose a random number in the range [0,3) totally uniformly - and if the PRNG has 2^n possible states, you can't do that - one or two of the possibilities will have a *slightly* higher probability of occurring.

Jon Skeet 2009-06-08 08:26:07

@Jon: you're right and my argument is flawed; I actually posted a comment detailing this, than deleted it again and sent the one above; I believe I should do some more thinking before answering again...

Christoph 2009-06-08 08:39:09

I think I'll accept this answer, as it has provided me with a lot of references to good resources. Although other answers were really valuable too, especially the one by Christoph.

Rene Saarsoo 2009-06-08 08:45:29

Answer 2

+1 A:

It is a hack, certainly. In practice, an infinitely looping algorithm is not likely. If you're sorting objects, you could loop through the coords array and do something like:

for (var i = 0; i < coords.length; i++)
    coords[i].sortValue = Math.random();

coords.sort(useSortValue)

function useSortValue(a, b)
{
  return a.sortValue - b.sortValue;
}

(and then loop through them again to remove the sortValue)

Still a hack though. If you want to do it nicely, you have to do it the hard way :)

Thorarin 2009-06-07 21:10:44

Answer 3

A:

There is nothing wrong with it.

The function you pass to .sort() usually looks something like

function sortingFunc( first, second )
{
  // example:
  return first - second ;
}

Your job in sortingFunc is to return:

a negative number if first goes before second
a positive number if first should go after second
and 0 if they are completely equal

The above sorting function puts things in order.

If you return -'s and +'s randomly as what you have, you get a random ordering.

Like in MySQL:

SELECT * from table ORDER BY rand()

bobobobo 2009-06-07 21:38:22

there *is* something wrong with this approach: depending on the sorting algorithm in use by the JS implementation, the probabilities won't be equally distributed!

Christoph 2009-06-07 21:51:06

Is that something we practically worry about?

bobobobo 2009-06-07 21:52:25

@bobobobo: depending on the application, yes, sometimes we do; also, a correctly working `shuffle()` only has to be written once, so it's not really an issue: just put the snippet in your code vault and unearth it whenever you need it

Christoph 2009-06-07 21:58:38

Answer 4

+9 A:

Christoph 2009-06-07 21:41:59

Thanks for the implementation. It's blazingly fast! Especially compared to that slow crap that I wrote by myself in the meantime.

Rene Saarsoo 2009-06-08 08:22:00

Answer 5

+2 A:

I think it's fine for cases where you're not picky about distribution and you want the source code to be small.

In JavaScript (where the source is transmitted constantly), small makes a difference in bandwidth costs.

Nosredna 2009-06-07 22:09:22

Answer 6

+2 A:

I did some measurements of how random the results of this random sort are...

My technique was to take a small array [1,2,3,4] and create all (4! = 24) permutations of it. Then I would apply the shuffling function to the array a large number of times and count how many times each permutation is generated. A good shuffling algoritm would distribute the results quite evenly over all the permutations, while a bad one would not create that uniform result.

Using the code below I tested in Firefox, Opera, Chrome, IE6/7/8.

Surprisingly for me, the random sort and the real shuffle both created equally uniform distributions. So it seems that (as many have suggested) the main browsers are using merge sort. This of course doesn't mean, that there can't be a browser out there, that does differently, but I would say it means, that this random-sort-method is reliable enough to use in practice.

EDIT: This test didn't really measured correctly the randomness or lack thereof. See the other answer I posted.

But on the performance side the shuffle function given by Cristoph was a clear winner. Even for small four-element arrays the real shuffle performed about twice as fast as random-sort!

// The shuffle function posted by Cristoph.
var shuffle = function(array) {
    var tmp, current, top = array.length;

    if(top) while(--top) {
        current = Math.floor(Math.random() * (top + 1));
        tmp = array[current];
        array[current] = array[top];
        array[top] = tmp;
    }

    return array;
};

// the random sort function
var rnd = function() {
  return Math.round(Math.random())-0.5;
};
var randSort = function(A) {
  return A.sort(rnd);
};

var permutations = function(A) {
  if (A.length == 1) {
    return [A];
  }
  else {
    var perms = [];
    for (var i=0; i<A.length; i++) {
      var x = A.slice(i, i+1);
      var xs = A.slice(0, i).concat(A.slice(i+1));
      var subperms = permutations(xs);
      for (var j=0; j<subperms.length; j++) {
        perms.push(x.concat(subperms[j]));
      }
    }
    return perms;
  }
};

var test = function(A, iterations, func) {
  // init permutations
  var stats = {};
  var perms = permutations(A);
  for (var i in perms){
    stats[""+perms[i]] = 0;
  }

  // shuffle many times and gather stats
  var start=new Date();
  for (var i=0; i<iterations; i++) {
    var shuffled = func(A);
    stats[""+shuffled]++;
  }
  var end=new Date();

  // format result
  var arr=[];
  for (var i in stats) {
    arr.push(i+" "+stats[i]);
  }
  return arr.join("\n")+"\n\nTime taken: " + ((end - start)/1000) + " seconds.";
};

alert("random sort: " + test([1,2,3,4], 100000, randSort));
alert("shuffle: " + test([1,2,3,4], 100000, shuffle));

Rene Saarsoo 2009-06-08 08:17:20

+1 for benchmarking instead of subjective arguing.

PatrikAkerstrand 2009-06-08 08:32:37

Answer 7

+3 A:

Interestingly, Microsoft used the same technique in their pick-random-browser-page.

They used a slightly different comparison function:

function RandomSort(a,b) {
    return (0.5 - Math.random());
}

Looks almost the same to me, but it turned out to be not so random...

So I made some testruns again with the same methodology used in the linked article, and indeed - turned out that the random-sorting-method produced flawed results. New test code here:

function shuffle(arr) {
  arr.sort(function(a,b) {
    return (0.5 - Math.random());
  });
}

function shuffle2(arr) {
  arr.sort(function(a,b) {
    return (Math.round(Math.random())-0.5);
  });
}

function shuffle3(array) {
  var tmp, current, top = array.length;

  if(top) while(--top) {
    current = Math.floor(Math.random() * (top + 1));
    tmp = array[current];
    array[current] = array[top];
    array[top] = tmp;
  }

  return array;
}

var counts = [
  [0,0,0,0,0],
  [0,0,0,0,0],
  [0,0,0,0,0],
  [0,0,0,0,0],
  [0,0,0,0,0]
];

var arr;
for (var i=0; i<100000; i++) {
  arr = [0,1,2,3,4];
  shuffle3(arr);
  arr.forEach(function(x, i){ counts[x][i]++;});
}

alert(counts.map(function(a){return a.join(", ");}).join("\n"));

Rene Saarsoo 2010-03-01 22:43:59

ansaurus

tags:

views:

answers:

Is it correct to use JavaScript Array.sort() method for shuffling?

related questions