[1, 1, 1, 2, 3]
=> 1
['cat', 'dog', 'snake', 'dog']
=> dog
views:
1997answers:
6arr = [1, 1, 1, 2, 3]
freq = arr.inject(Hash.new(0)) { |h,v| h[v] += 1; h }
arr.sort_by { |v| freq[v] }.last
idx = {}
[2,2,1,3,1].each { |i| idx.include?(i) ? idx[i] += 1 : idx[i] = 1}
This is just a simple indexer. You could replace the [2,2,1..] array with any sort of symbol/string based identifier, this wouldn't work with objects, you'd need to introduce a bit more complexity, but this is simple enough.
rereading your questions, this solution is a bit over-engineered since its going to return you an index of all occurrences, not just the one with the most.
arr = [1, 1, 1, 2, 3]
arr.sort_by { |e| arr.grep(e).size }.last # => 1
Probably more effective as an instance method. Using Zach's solution:
class Array
def mode
sort_by {|i| grep(i).length }.last
end
end
While I adore the grep solution for its elegance and for reminding (or teaching) me about a method in Enumerable that I'd forgotten (or overlooked completely), it's slow, slow, slow. I agree 100% that creating the Array#mode
method is a good idea, however - this is Ruby, we don't need a library of functions that act on arrays, we can create a mixin that adds the necessary functions into the Array class itself.
But the inject(Hash) alternative uses a sort, which we also don't really need: we just want the value with the highest occurrence.
Neither of the solutions address the possibility that more than one value may be the mode. Maybe that's not an issue in the problem as stated (can't tell). I think I'd want to know if there was a tie, though, and anyway, I think we can improve a little on the performance.
require 'benchmark'
class Array
def mode1
sort_by {|i| grep(i).length }.last
end
def mode2
freq = inject(Hash.new(0)) { |h,v| h[v] += 1; h }
sort_by { |v| freq[v] }.last
end
def mode3
freq = inject(Hash.new(0)) { |h,v| h[v] += 1; h }
max = freq.values.max # we're only interested in the key(s) with the highest frequency
freq.select { |k, f| f == max } # extract the keys that have the max frequency
end
end
arr = Array.new(1_000) { |i| rand(100) } # something to test with
Benchmark.bm(30) do |r|
res = {}
(1..3).each do |i|
m = "mode#{i}"
r.report(m) do
100.times do
res[m] = arr.send(m).inspect
end
end
end
res.each { |k, v| puts "%10s = %s" % [k, v] }
end
And here's output from a sample run.
user system total real
mode1 34.375000 0.000000 34.375000 ( 34.393000)
mode2 0.359000 0.000000 0.359000 ( 0.359000)
mode3 0.219000 0.000000 0.219000 ( 0.219000)
mode1 = 41
mode2 = 41
mode3 = [[41, 17], [80, 17], [72, 17]]
The "optimised" mode3 took 60% of the time of the previous record-holder. Note also the multiple highest-frequency entries.
EDIT
A few months down the line, I noticed Nilesh's answer, which offered this:
def mode4
group_by{|i| i}.max{|x,y| x[1].length <=> y[1].length}[0]
end
It doesn't work with 1.8.6 out of the box, because that version doesn't have Array#group_by. ActiveSupport has it, for the Rails developers, although it seems about 2-3% slower than mode3 above. Using the (excellent) backports gem, though, produces a 10-12% gain, as well as delivering a whole pile of 1.8.7 and 1.9 extras.
The above applies to 1.8.6 only - and mainly only if installed on Windows. Since I have it installed, here's what you get from IronRuby 1.0 (on .NET 4.0):
========================== IronRuby =====================================
(iterations bumped to **1000**) user system total real
mode1 (I didn't bother :-))
mode2 4.265625 0.046875 4.312500 ( 4.203151)
mode3 0.828125 0.000000 0.828125 ( 0.781255)
mode4 1.203125 0.000000 1.203125 ( 1.062507)
So in the event that performance is super-critical, benchmark the options on your Ruby version & OS. YMMV.
Mike: I found a faster method. Try this:
class Array
def mode4
group_by{|i| i}.max{|x,y| x[1].length <=> y[1].length}[0]
end
end
The Benchmark output:
user system total real
mode1 24.340000 0.070000 24.410000 ( 24.526991)
mode2 0.200000 0.000000 0.200000 ( 0.195348)
mode3 0.120000 0.000000 0.120000 ( 0.118200)
mode4 0.050000 0.010000 0.060000 ( 0.056315)
mode1 = 76
mode2 = 76
mode3 = [[76, 18]]
mode4 = 76