views:

52

answers:

1

In my application I am trying to display only unique elements of an array of activerecord objects(loss_reports) based on two attributes of a loss_report.

Schema

class Agent < ActiveRecord::Base
  has_many :loss_reports, :through => policy  
end

class LossReport < ActiveRecord::Base
  belongs_to :agent  
end

I first tried to override the eql? and hash method of a LossReport so that I could do something similar to:

Option 1:

class LossReport ...
  def eql? other
    self.policy_id == other.policy_id && loss_occurred_on.hash == self.loss_occurred_on  
  end 

  def hash 
    policy_id + loss_occurred_on.hash
  end
end  

class Agent ...
  def unique_losses
    loss_reports.to_set
  end
end

but quickly removed the code because of ActiveRecord already overriding the methods and my not being sure of the repercussions.

Option 2:

class Agent ...
  def unique_losses
    loss_reports.sort{|l1,l2| l2.created_at <=> l1.created_at}.group_by{|l| (l.policy_id + l.loss_occurred_on.hash)}.collect{|hl| hl[-1].first}
  end
end  

Option 3:

 class Agent
   def unique_losses
     hsh_array = []
     unique = []
     loss_reports.sort{|l1,l2| l2.created_at <=> l1.created_at}.each do |l|
       unique << l unless hsh_array.include?(l.hsh)
       hsh_array << l.hsh
     end
     unique         
   end
 end

Benchmark results:

Benchmark.bmbm do |bm|
  bm.report("option 2") do
    losses.sort{|l1,l2| l2.created_at <=> l1.created_at}.group_by{|l| (l.policy_id +  l.loss_occurred_on.hash)}.collect{|hl| hl[-1].first}
  end
  bm.report("option 3") do
    hsh_array,unique = [],[]
    losses.sort{|l1,l2| l2.created_at <=> l1.created_at}.each do |l|
      unique << l unless hsh_array.include?(l.policy_id+l.loss_occurred_on.hash)
      hsh_array << l.policy_id + l.loss_occurred_on.hash
    end
  end
end
Rehearsal --------------------------------------------
option 2   0.400000   0.000000   0.400000 (  0.407615)
option 3   0.250000   0.000000   0.250000 (  0.254399)
----------------------------------- total: 0.650000sec

               user     system      total        real
option 2   0.400000   0.000000   0.400000 (  0.403535)
option 3   0.250000   0.000000   0.250000 (  0.262578)

Neither way feels right but both work. Which is the best option or is there an even better way?

+1  A: 

I have no idea about benchmarks, but it seems like inject would be the simplest way:

loss_reports.inject([]) do |arr, report|
  arr << report unless arr.detect{|r| ... } 
end

Or maybe even better would be to define a named_scope with a custom SQL group by...

floyd
Great suggestion - found a similar question at http://stackoverflow.com/questions/2061389 with an example of using named_scope.