tags:

views:

430

answers:

5

C# has BitArray, C has bit fields.. I couldn't find an equivalent in the Ruby core. Google showed me a BitField class that Peter Cooper has written for the same.

I've been reading Jon Bentley's Programming Pearls and while trying out one of the first examples, which deals with a BitMap sort - I needed a type that is an array of bits. I used Peter's class

class BitMapSort
  def self.sort(list, value_range_max)
    bitmap = BitField.new(value_range_max)

    list.each{|x| bitmap[x] = 1 }

    sorted_list = []
    0.upto(value_range_max-1){ |number|
      sorted_list << number if (bitmap[number] == 1)
    }
    sorted_list
  end
end

Running this on a set of 1M unique numbers in the range [0, 10,000,000), produced some interesting results,

                         user     system      total        real
bitmap               11.078000   0.015000  11.093000 ( 11.156250)
ruby-system-sort      0.531000   0.000000   0.531000 (  0.531250)
quick-sort           21.562000   0.000000  21.562000 ( 21.625000)

Benchmark.bm(20){|x|
    x.report("bitmap"){ ret = BitMapSort.sort(series, 10_000_000);}
    x.report("ruby-system-sort"){ ret = series.sort; }
    x.report("quick-sort"){ ret = QuickSort.sort( series, 0, series.length-1); }
    }

How is ruby's default sort 22x faster than 1M BitField.set + 1 loop over a 10M bit vector ? Is there a more efficient bit field / array in Ruby ? How does Ruby's default sort achieve this level of performance.. is it jumping into C to get this done?

+2  A: 

The reason why it's so much faster is probably because it's implemented in the ruby implementation in C.

jonnii
+7  A: 

How does Ruby's default sort achieve this level of performance.. is it jumping into C to get this done?

All of the core classes and methods in ruby's default implementation are implemented in C.

sepp2k
+2  A: 

I think the real problem here is you're doing 10M comparisons, 10M array fetches, 10M of a lot of things, whereas a properly optimized sort routine is doing far fewer operations since it is working with a fixed set of 1M items.

Basic operations such as sort are highly optimized in the Ruby VM and are difficult to beat with a pure-Ruby alternative.

tadman
+6  A: 

Array#sort is implemented in C, see rb_ary_sort in array.c

It also has some checks to compare Fixnums so sorting an array of Integers doesn't even need method lookups.

levinalex
Which begs the question.. if I wanted to write BitMapSort (since it is more appropriate to the input list here - unique nos from a constrained range) which performs better than Array#sort, is my only option writing code in C? I repeated this exercise in pure C# , BitMapSort trumps Array.Sort there. If I pick 1M nos from a range of [0, 2M) instead of [0, 10M) it halves the time. Interesting..
Gishu
the main thing is method call overhead. Method calls in ruby 1.8 are really slow and not cached at all.
levinalex
+2  A: 

Jumping into C is exactly correct. Array and Hash both have C implementations of many methods in order to improve performance. Integer and float literals also have some tricky code optimizations. When you convert them to a bitmap you loose this optimization as well.

With a compiled language like C or Java it really makes sense to look for tricky optimization patterns. with interpreted languages the cost of interpreting each command make this counter productive.

John F. Miller