views:

129

answers:

2

As you can see in the current code below, I am finding the duplicate based on the attribute recordable_id. What I need to do is find the duplicate based on four matching attributes: user_id, recordable_type, hero_type, recordable_id. How must I modify the code?

heroes = User.heroes

for hero in heroes
  hero_statuses = hero.hero_statuses

  seen = []

  hero_statuses.sort! {|a,b| a.created_at <=> b.created_at } # sort by created_at
  hero_statuses.each do |hero_status|
    if seen.map(&:recordable_id).include? hero_status.recordable_id # check if the id has been seen already
      hero_status.revoke
    else
      seen << hero_status # if not, add it to the seen array
    end
  end
end
+1  A: 

Try this:

HeroStatus.all(:group =>  "user_id, recordable_type, hero_type, recordable_id",
               :having => "count(*) > 1").each do |status|
  status.revoke 
end

Edit 2 To revoke the all the latest duplicate entries do the following:

HeroStatus.all(:joins => "(
     SELECT   user_id, recordable_type, hero_type, 
              recordable_id, MIN(created_at) AS created_at
     FROM     hero_statuses
     GROUP BY user_id, recordable_type, hero_type, recordable_id
     HAVING   COUNT(*) > 1
   ) AS A ON A.user_id         = hero_statuses.user_id         AND 
             A.recordable_type = hero_statuses.recordable_type AND
             A.hero_type       = hero_statuses.hero_type       AND
             A.recordable_id   = hero_statuses.recordable_id   AND
             A.created_at      < hero_statuses.created_
").each do |status|
  status.revoke 
end
KandadaBoggu
Nice. This is getting me closer. Here's one more wrinkle that I forgot to mention. I want to get rid of the duplicates with the most-recent created_at date, leaving the record that was initially created.
keruilin
Updated the answer, take a look.
KandadaBoggu
thx kan! this works great. i was looking for ruby-based answer, so going to check off marc's. but appreciate your assistance!
keruilin
I didn't give you a Ruby based solution as it is very inefficient for your use case. This is something that DB does best. You can use the Ruby approach for small data set(< 1000 statuses), otherwise DB approach is better.
KandadaBoggu
A: 

Using straight Ruby (not the SQL server):

heroes = User.heroes

for hero in heroes
  hero_statuses = hero.hero_statuses

  seen = {}

  hero_statuses.sort_by!(&:created_at)
  hero_statuses.each do |status|
    key = [status.user_id, status.recordable_type, status.hero_type, status.recordable_id]
    if seen.has_key?(key)
      status.revoke
    else
      seen[key] = status # if not, add it to the seen array
    end
  end

  remaining = seen.values
end

For lookups, always use Hash (or Set, but here I thought it would be nice to keep the statuses that have been kept)

Note: I used sort_by!, but that's new to 1.9.2, so use sort_by (or require "backports")

Marc-André Lafortune