tags:

views:

498

answers:

4

I think this is a great Ruby one-liner:

someArray.sort_by {rand}

It's concise, it's readable, and it works - but I don't quite understand how. Here's what I know:

  1. rand evaluates to a number between 0 and 1 (like 0.783468632804653)
  2. rand is being repeatedly evaluated in the code above, because assigning it to x first breaks the random sort
  3. sort_by {0.783468632804653}, or any other number I tried, has no effect on the array

ruby-doc.org wasn't much help to me in this case.

Can someone explain this step-by-step?

+4  A: 

The block rand yields a key that is used in sorting. It's different each time it's evaluated, so you get a random order.

When you put a single number in there, it's the same each time, so the order doesn't change. That means the sorting algorithm is 'stable' - it doesn't move in-order entries.

EDIT: and here's some even shorter, even clearer code:

someArray.shuffle
Peter
+1  A: 

sort_by is a refinement of sort, which is used like so:

people.sort do |person1, person2|
  person1 <=> person2
end

The sort function yields to the block when it need to know the order of two things (in this case, people). The block returns -1 if the left thing is less than the right thing, 0 if they are equal, and 1 if the right thing is larger than the left thing. The spaceship operator <=> has the wonderful property that it returns -1, 0 or +1, exactly what sort needs.

I haven't looked, but odds are good that Ruby is using the quicksort algorithm.

Some smart person noticed that we kept doing the same thing on the left side of the spaceship operator as we do on the right side, and came up with sort_by, used like so:

people.sort_by do |person|
  person.name
end

Instead of the sort algorithm giving two objects to the block and letting the block compare them, the algorithm gives a single object to the block. The block then returns whatever attribute or value should be used to do the sort. Ruby remembers the value the block returned for each element, and comparing those values, knows what order to put things in. It's neat that you don't have to repeat yourself anymore.

Your shuffle code works by just "making stuff up" when the sort algorithm yields to the block. Instead of returning something sensible, the block yields a random value. This causes the sort algorithm to sort things, well, randomly.

Wayne Conrad
A: 

What sort_by does, can be split up into 2 simple steps:

1) First, it calls the "map/collect" method on the provided array and with the provided block. In your case the result of it would be just an array of random numbers - let's call this intermediate array A1. Note, that it has the length of the initial array.

2) Then A1 gets sorted normally, but what is returned is not the sorted A1, but rather the original array, where the items are moved around the same way as the corresponding ones from A1, while it's being sorted!

That's how the following example works:

["Paulo", "Sergito", "Nick"].sort_by {|word| word.length} 

It sorts the words by their length, because first the array of words is mapped into an array of lengths, and those lengths are then sorted, while the words in the original array are moved around correspondingly.

Sergei Kozlov
+13  A: 

In Ruby 1.8/1.9 both sort and sort_by are implemented in C, this is a rough equivalent of how this works:

Say you start with [1,2,3,4] and call sort_by{rand}

Step 1 (I invented some random numbers):

An array of tuples is created: [[0.12232, 1],[0.53434, 2],[0.333, 3],[0.99, 4]]

In roughly equivalent Ruby code this is: [1,2,3,4].map{|x| [rand, x]}

Step 2

Ruby's quick sort is performed on the array based off the first element: (note the internal implementation is far from trivial and contains a ton of optimisations for already ordered arrays and such)

[[0.12232, 1],[0.333, 3],[0.53434, 2],[0.99, 4]]

In rough ruby this step is: ary.sort{|x,y| x[0] <=> y[0]}

Step 3

Pointers are copied from the new sorted array, to the correct position in the original array. [1,3,2,4]

In rough ruby this step is: ary.map{|x,y| y}

This technique is sometimes referred to as Schwartzian Transform. Caching means that the expensive operation is not performed more than N times. Meaning, this is a very efficient way of randomizing an array.

Note: array.shuffle! will be the most efficient built-in way to shuffle an array (in-place) since it uses a modern version of Fisher-Yates

static VALUE
rb_ary_shuffle_bang(VALUE ary)
{
    long i = RARRAY_LEN(ary);

    rb_ary_modify(ary);
    while (i) {
  long j = rb_genrand_real()*i;
  VALUE tmp = RARRAY_PTR(ary)[--i];
  RARRAY_PTR(ary)[i] = RARRAY_PTR(ary)[j];
  RARRAY_PTR(ary)[j] = tmp;
    }
    return ary;
}
Sam Saffron
Awesome, I was wondering whether `sort_by` used a Schwartizian Transform. Thanks!
kejadlen
Rubinius, JRuby and IronRuby all have slightly different implementations. But I think it is safe to assume the same principals apply.
Sam Saffron
A great answer. You were clear, while pointing out reasons why the real thing is more complicated behind the scenes. Thanks!
Nathan Long