views:

4028

answers:

5

I've heard of a few ways to implement tagging; using a mapping table between TagID and ItemID (makes sense to me, but does it scale?), adding a fixed number of possible TagID columns to ItemID (seems like a bad idea), Keeping tags in a text column that's comma separated (sounds crazy but could work). I've even heard someone recommend a sparse matrix, but then how do the tag names grow gracefully?

Am I missing a best practice for tags?

+75  A: 

Three tables (one for storing all items, one for all tags, and one for the relation between the two), properly indexed, with foreign keys set running on a proper database, should work well and scale properly.

Table: Item
Columns: ItemID, Title, Content

Table: Tag
Columns: TagID, Title

Table: ItemTag
Columns: ItemID, TagID
Yaakov Ellis
I don't think this can be upvoted enough. This is clearly the best way to do it.
BobbyShaftoe
Clearly? Where is the evidence that this is clearly the best way to do this (except for the fact that it is Normalized - also, remember normalization is fast inserts, slow selects).
Redbeard 0x0A
Got any documentation on this approach? I believe this would also be grate with indexes instead of the comma separed text column
Cesar
This is known as the “Toxi” solution, you can find additional information about it here : http://www.pui.ch/phred/archives/2005/04/tags-database-schemas.html
The Pixel Developer
The funny thing is that, everybody thinks taggging is very complicated. This is all there is to it. Unless ur google ....
Cherian
+2  A: 

I've always kept the tags in a separate table and then had a mapping table. Of course I've never done anything on a really large scale either.

Having a "tags" table and a map table makes it pretty trivial to generate tag clouds & such since you can easily put together SQL to get a list of tags with counts of how often each tag is used.

Mark Biek
+3  A: 

Agreed with Yakkov or you'll next question will be this.
http://stackoverflow.com/questions/20484/use-a-like-clause-in-part-of-an-inner-join

jms
+3  A: 

Use a single formatted text column[1] for storing the tags and use a capable full text search engine to index this. Else you will run into scaling problems when trying to implement boolean queries.

If you need details about the tags you have, you can either keep track of it in a incrementally maintained table or run a batch job to extract the information.

[1] Some RDBMS even provide a native array type which might be even better suited for storage by not needing a parsing step, but might cause problems with the full text search.

David Schmitt
+9  A: 

If you are using a database that supports map-reduce, like couchdb, storing tags in a plain text field or list field is indeed the best way. Example:

tagcloud: {
  map: function(doc){ 
    for(tag in doc.tags){ 
      emit(doc.tags[tag],1) 
    }
  }
  reduce: function(keys,values){
    return values.length
  }
}

Running this with group=true will group the results by tag name, and even return a count of the number of times that tag was encountered. It's very similar to counting the occurrences of a word in text.

Nick Retallack