views:

319

answers:

6

I have this array, for example (the size is variable):

   x = ["1.111", "1.122", "1.250", "1.111"]

and I need to find the most commom value ("1.111" in this case).

Is there an easy way to do that?

Tks in advance!


EDIT #1: Thank you all for the answers!


EDIT #2: I've changed my accepted answer based on Z.E.D.'s information. Thank you all again!

+2  A: 

You could sort the array and then loop over it once. In the loop just keep track of the current item and the number of times it is seen. Once the list ends or the item changes, set max_count == count if count > max_count. And of course keep track of which item has the max_count.

Justin Ethier
+1  A: 

You could create a hashmap that stores the array items as keys with their values being the number of times that element appears in the array.

Pseudo Code:

["1.111", "1.122", "1.250", "1.111"].each { |num|
  count=your_hash_map.get(num)
  if(item==nil)
    hashmap.put(num,1)
  else
    hashmap.put(num,count+1)
}

As already mentioned, sorting might be faster.

Jeremy
Why would sorting be faster? Sorting is O(n log n) at best while this is O(n)
Pyrolistical
Correction, comparison-based sorting is O(n log n). There are linear sorts, like bucket sort or radix sort.EDIT: you do usually have to have certain types of data for bucket sort or radix sort to really be more efficient than comparison sorts. What they make up for in time they usually gobble up in space. FTR, the pseudo code above is bucket sorting.
saramah
+13  A: 
#!/usr/bin/ruby1.8

def most_common_value(a)
  a.group_by do |e|
    e
  end.values.max_by(&:size).first
end

x = ["1.111", "1.122", "1.250", "1.111"]
p most_common_value(x)    # => "1.111"

Note: Enumberable.max_by is new with Ruby 1.9, but it has been backported to 1.8.7

Or as Enumerable#mode:

Enumerable.class_eval do
  def mode
    group_by do |e|
      e
    end.values.max_by(&:size).first
  end
end

["1.111", "1.122", "1.250", "1.111"].mode
# => "1.111"
Wayne Conrad
I'm impressed with the speed up over the usual way I'd do this. Nice job.
Greg
@Wayne Conrad, uber solution. +1
macek
Here's a shorter version:x.group_by { |e| e }.values.max_by(-)
Michael Kohl
@Michael Kohl, Good one. I'll edit my answer to use the new... whatever you call it... trick.
Wayne Conrad
@wayne: The "trick" is the method Symbol#to_proc.
Michael Kohl
+1  A: 

Using the default value feature of hashes:

>> x = ["1.111", "1.122", "1.250", "1.111"]
>> h = Hash.new(0)
>> x.each{|i| h[i] += 1 }
>> h.max{|a,b| a[1] <=> b[1] }
["1.111", 2]
jleedev
This was selected as the answer, but look at the benchmark results I have, displayed below.
Greg
+2  A: 

One pass through the hash to accumulate the counts. Use .max() to find the hash entry with the largest value.

#!/usr/bin/ruby

a = Hash.new(0)
["1.111", "1.122", "1.250", "1.111"].each { |num|
  a[num] += 1
}

a.max{ |a,b| a[1]  b[1] } # => ["1.111", 2]

or, roll it all into one line:

ary.inject(Hash.new(0)){ |h,i| h[i] += 1; h }.max{ |a,b| a[1] <=> b[1] } # => ["1.111", 2]

If you only want the item back add .first():

ary.inject(Hash.new(0)){ |h,i| h[i] += 1; h }.max{ |a,b| a[1] <=> b[1] }.first # => "1.111"

The first sample I used is how it would be done in Perl usually. The second is more Ruby-ish. Both work with older versions of Ruby. I wanted to compare them, plus see how Wayne's solution would speed things up so I tested with benchmark:

#!/usr/bin/env ruby

require 'benchmark'

ary = ["1.111", "1.122", "1.250", "1.111"] * 1000 

def most_common_value(a)
  a.group_by { |e| e }.values.max_by { |values| values.size }.first
end

n = 1000
Benchmark.bm(20) do |x|
  x.report("Hash.new(0)") do
    n.times do 
      a = Hash.new(0)
      ary.each { |num| a[num] += 1 }
      a.max{ |a,b| a[1]  b[1] }.first
    end 
  end

  x.report("inject:") do
    n.times do
      ary.inject(Hash.new(0)){ |h,i| h[i] += 1; h }.max{ |a,b| a[1] <=> b[1] }.first
    end
  end

  x.report("most_common_value():") do
    n.times do
      most_common_value(ary)
    end
  end
end

Here's the results:

                          user     system      total        real
Hash.new(0)           2.150000   0.000000   2.150000 (  2.164180)
inject:               2.440000   0.010000   2.450000 (  2.451466)
most_common_value():  1.080000   0.000000   1.080000 (  1.089784)
Greg
very very nice! thank you very much for this information... actually I was reading about `benchmark` to do that. thank you again.
j.
It shows why benchmarking is important. I assumed using inject would be faster than looping over the array using each, but Wayne's solution cut the time in half.
Greg
@Z.E.D., I'm getting a syntax error, `unexpected tIDENTIFIER, expecting '}'` on line 15, `a.max{ |a,b| a[1] b[1] }.first`, caret at `b[`. (Ruby 1.9.1).
macek
@smotchkkiss, change the `a.max{ |a,b| a[1] b[1] }` to `a.max{ |a,b| a[1] <=> b[1] }`
j.
@j., is this an error in the markdown rendering?
macek
Yes, it's an error. Either the markdown or browser thinks the <=> is a tag and hides it so I had to change it to <
Greg
A: 

It will return most popular value in array

x.group_by{|a| a }.sort_by{|a,b| b.size<=>a.size}.first[0]

IE:

x = ["1.111", "1.122", "1.250", "1.111"]
# Most popular
x.group_by{|a| a }.sort_by{|a,b| b.size<=>a.size}.first[0]
#=> "1.111
# How many times
x.group_by{|a| a }.sort_by{|a,b| b.size<=>a.size}.first[1].size
#=> 2
fl00r