tags:

views:

101

answers:

5

If i had a list of balls each of which has a color property. how can i cleanly get the list of balls with the most frequent color.

[m1,m2,m3,m4]

say,

        m1.color = blue
        m2.color = blue
        m3.color = red
        m4.color = blue

[m1,m2,m4] is the list of balls with the most frequent color

My Approach is to do:

[m1,m2,m3,m4].group_by{|ball| ball.color}.each do |samecolor|
  my_items = samecolor.count
end

where count is defined as

class Array
  def count
  k =Hash.new(0)
  self.each{|x|k[x]+=1}
  k
  end
end

my_items will be a hash of frequencies foreach same color group. My implementation could be buggy and i feel there must be a better and more smarter way. any ideas please?

+1  A: 

Your code isn't bad, but it is inefficient. If I were you I would seek a solution that iterates through your array only once, like this:

balls = [m1, m2, m3, m4]
most_idx = nil

groups = balls.inject({}) do |hsh, ball|
  hsh[ball.color] = [] if hsh[ball.color].nil?
  hsh[ball.color] << ball

  most_idx = ball.color if hsh[most_idx].nil? || hsh[ball.color].size > hsh[most_idx].size 
  hsh
end

groups[most_idx] # => [m1,m2,m4]

This does basically the same thing as group_by, but at the same time it counts up the groups and keeps a record of which group is largest (most_idx).

Jordan
Hmm, I'd want to benchmark it. Calling the core (implemented in C) methods certainly gives you great gains in performance.
glenn jackman
Inject is an extremely short method if you look at the source--it's just an assignment followed by an each and a return. If you want to use each directly instead you can, but you won't see any performance gains. Either way, iterating only once is what's important.http://ruby-doc.org/core/classes/Enumerable.src/M003140.html
Jordan
Your assumption that "iterating once is what's important" is quite wrong when it comes to Ruby. I tested both your algorithms in Ruby 1.8.7 with arrays of randomly generated balls increasing in size up to 1 million and yours was vastly slower than eastafri in every bracket. In general, combining the built-in functions in Ruby will be faster than rolling your own "optimized version."
Chuck
Just to clarify (since it's too late to edit): I'm not saying I think this is a bad solution. Just that you'll want to test with your particular Ruby implementation on your particular dataset to make sure that the algorithm's complexity isn't outweighed by other factors.
Chuck
Fair enough. I appreciate your rigor.
Jordan
Thanks Everyone for the discussion and the advice!
eastafri
A: 

How about:

color,balls = [m1,m2,m3,m4].group_by { |b| b.color }.max_by(&:size)

ezpz
Mladen Jablanović
@Mladen: updated to use `max_by`
ezpz
Note: This assumes Ruby 1.9.1
Greg
1.8.7 actually.
Mladen Jablanović
and the question already uses `group_by`, so we can assume at least 1.8.7 is OK
glenn jackman
A: 
myhash = {}

mylist.each do |ball|
  if myhash[ball.color]
    myhash[ball.color] += 1
  else
    myhash[ball.color] = 1    
  end
end

puts myhash.sort{|a,b| b[1] <=> a[1]}
Beanish
You can instantiate a hash using `myhash=Hash.new(0)` so you don't need to check whether a value with a given key already exists, just increment it.
Mladen Jablanović
+2  A: 

You found group_by but missed max_by

max_color, max_balls = [m1,m2,m3,m4].group_by {|b| b.color}.max_by {|color, balls| balls.length}
glenn jackman
A: 

Here's how I'd do it. The basic idea uses inject to accumulate the values into a hash, and comes from "12 - Building a Histogram" in "The Ruby Cookbook".

#!/usr/bin/env ruby

class M
  attr_reader :color
  def initialize(c)
    @color = c
  end
end

m1 = M.new('blue')
m2 = M.new('blue')
m3 = M.new('red')
m4 = M.new('blue')

hash = [m1.color, m2.color, m3.color, m4.color].inject(Hash.new(0)){ |h, x| h[x] += 1; h } # => {"blue"=>3, "red"=>1}
hash = [m1, m2, m3, m4].inject(Hash.new(0)){ |h, x| h[x.color] += 1; h } # => {"blue"=>3, "red"=>1}

There are two different ways to do it, depending on how much knowledge you want the inject() to know about your objects.

Greg