views:

297

answers:

3

A colleague needed to sort an array of ActiveRecord objects in a Rails app. He tried the obvious Array.sort! but it seemed surprisingly slow, taking 32s for an array of 3700 objects. So just in case it was these big fat objects slowing things down, he reimplemented the sort by sorting an array of small objects, then reordering the original array of ActiveRecord objects to match - as shown in the code below. Tada! The sort now takes 700ms.

That really surprised me. Does Ruby's sort method end up copying objects about the place rather than just references? He's using Ruby 1.8.6/7.

def self.sort_events(events)
  event_sorters = Array.new(events.length) {|i| EventSorter.new(i, events[i])}
  event_sorters.sort!
  event_sorters.collect {|es| events[es.index]}   
end

private

# Class used by sort_events
class EventSorter
  attr_reader :sqn
  attr_reader :time
  attr_reader :index

  def initialize(index, event)
    @index = index  
    @sqn   = event.sqn
    @time  = event.time  
  end

  def <=>(b)
    @time != b.time ? @time <=> b.time : @sqn <=> b.sqn
  end
end
A: 

Nothing answers questions like this better than the actual language source code. Array#sort! uses sort_internal() which is defined in array.c:

sort_internal()

(Yes, I know that's the sources for 1.8.4 but I can't find 1.8.6 ones online right and am pretty sure this hasn't changed.)

Michael Kohl
Go on - give me a clue! I'm not sufficiently fluent in C to make much of this.
David Waller
Oh, sorry for that! It basically uses quick sort, which is between O(N^2) (worst case) and O(N log N) (best case).
Michael Kohl
But that doesn't seem to explain why it's slower sorting an array of large objects rather than an array of small objects. Does the implementation require copying the objects around the heap rather than simply rearranging pointers?
David Waller
+2  A: 

sort definitely does not copy the objects. One difference that I can imagine between the code using EventSorter and the code without it (which you didn't supply, so I have to guess) is that EventSorter calls event.sqn and event.time exactly once and stores the result in variables. During the sorting only the variables need to be accessed. The original version presumably called sqn and time each time the sort-block was invoked.

If this is the case, it can be fixed by using sort_by instead of sort. sort_by only calls the block once per object and then uses the cached results of the block for further comparisons.

sepp2k
You guessed right - Event has an almost identical <=> method to EventSorter, but in the case of Event, sqn and time are the names of columns in the database. That means that Rails/ActiveRecord provides sqn and time methods, which it seems parse the values in the ActiveRecord attributes hash every time they're called. So every time Event.<=> was called ActiveRecord was parsing a time string into a Ruby Time object, hence the horrible performance. Mystery solved! Thank you.
David Waller
+1  A: 

Just as an explanation of what is likely happening and how to deal with it...

Sorting tends to look at an element multiple times so an expensive lookup into the object or structure will become very costly very quickly.

A Schwartzian Transform is commonly used when sorting arrays of complex objects or structures. The basic idea is to pre-compute a simple value that accurately reflects the big structure or object, then sort the values, then use the resulting sorted array to refer back to the thing they came from.

http://en.wikipedia.org/wiki/Schwartzian_transform

Greg