views:

860

answers:

4

I have a pretty simple HABTM set of models

class Tag < ActiveRecord::Base 
   has_and_belongs_to_many :posts
end 

class Post < ActiveRecord::Base 
   has_and_belongs_to_many :tags

   def tags= (tag_list) 
      self.tags.clear 
      tag_list.strip.split(' ').each do 
        self.tags.build(:name => tag) 
      end
   end 
end

Now it all works alright except that I get a ton of duplicates in the Tags table.

What do I need to do to avoid duplicates (bases on name) in the tags table?

+1  A: 

You can pass the :uniq option as described in the documentation. Also note that the :uniq options doesn't prevent the creation of duplicate relationships, it only ensures accessor/find methods will select them once.

If you want to prevent duplicates in the association table you should create an unique index and handle the exception. Also validates_uniqueness_of doesn't work as expected because you can fall into the case a second request is writing to the database between the time the first request checks for duplicates and writes into the database.

Simone Carletti
I tried that, it does not really solve the problem cleanly. I want to avoid exceptions in the first place. I think my solution kind of does that though it needs a bit of tweaking.
Sam Saffron
Note: I already had the unique constraint, which resulted in exceptions, handling them is a huge pain.
Sam Saffron
+1  A: 

I worked around this by creating a before_save filter that fixes stuff up.

class Post < ActiveRecord::Base 
   has_and_belongs_to_many :tags
   before_save :fix_tags

   def tag_list= (tag_list) 
      self.tags.clear 
      tag_list.strip.split(' ').each do 
        self.tags.build(:name => tag) 
      end
   end  

    def fix_tags
      if self.tags.loaded?
        new_tags = [] 
        self.tags.each do |tag|
          if existing = Tag.find_by_name(tag.name) 
            new_tags << existing
          else 
            new_tags << tag
          end   
        end

        self.tags = new_tags 
      end
    end

end

It could be slightly optimised to work in batches with the tags, also it may need some slightly better transactional support.

Sam Saffron
This is exactly what the validates_uniqueness_of validator does. However, this solution alone doesn't prevent duplicate items because between your find_by_name and the association, an other request might write to the database. You can use this but you must be aware that some exceptions can occur (because you have an index) and you should catch them.
Simone Carletti
not really http://github.com/rails/rails/blob/b6bac73b282c7e500c43810f2a937fc0047e5979/activerecord/lib/active_record/validations.rb it is a validation that will exception out if there is a dupe, it does not handle it transparently, it has documented concurrency issues, and does need to be used with a unique index to guarantee no dupes.
Sam Saffron
+1  A: 

I would prefer to adjust the model and create the classes this way:

class Tag < ActiveRecord::Base 
   has_many :taggings
   has_many :posts, :through => :taggings
end 

class Post < ActiveRecord::Base 
   has_many :taggings
   has_many :tags, :through => :taggings
end

class Tagging < ActiveRecord::Base 
   belongs_to :tag
   belongs_to :post
end

Then I would wrap the creation in logic so that Tag models were reused if it existed already. I'd probably even put a unique constraint on the tag name to enforce it. That makes it more efficient to search either way since you can just use the indexes on the join table (to find all posts for a particular tag, and all tags for a particular post).

The only catch is that you can't allow renaming of tags since changing the tag name would affect all uses of that tag. Make the user delete the tag and create a new one instead.

Jeff Whitmire
+1  A: 

Set the uniq option:

class Tag < ActiveRecord::Base 
   has_and_belongs_to_many :posts , :uniq => true
end 

class Post < ActiveRecord::Base 
   has_and_belongs_to_many :tags , :uniq => true
Joshua Cheek