views:

675

answers:

5

Hi, i was wondering what the best way is to implement a tag system, like the one used in SO. I was thinking of this but i can't come up with a good scalable solution.

The things i was thinking was is the basic 3 table solution. Having the tags table, the 'articles' tables, and a tag_to_articles table.

Is this still the best solution to this problem, or are there alternatives. Using this method the table would get extremely large in time, and for searching this is not too well i assume. On the other hand it is not that important that the query executes fast.

Thank you.

+1  A: 

The proposed solution is the best -if not the only practicable- way I can think of to address the many-to-many relationship between tags and articles. So my vote is for 'yes, it's still the best.' I'd be interested in any alternatives though.

David Thomas
I agree. These Tags and TagMap tables have small record size and when properly indexed shouldn't decrease performance dramatically. Limiting number od tags per item could also be a good idea.
PanJanek
+2  A: 

Nothing wrong with your three-table solution.

Another option is to limit the number of tags that can be applied to an article (like 5 in SO) and add those directly to your article table.

Normalizing the DB has its benefits and drawbacks, just like hard-wiring things into one table has benefits and drawbacks.

Nothing says you can't do both. It goes against relational DB paradigms to repeat information, but if the goal is performance you may have to break the paradigms.

John at CashCommons
Yes putting the tags directly in to the articles table would sure be an option, although there are a few drawbacks to this method. If you store the 5 tags in a comma separated field like (tag1,2,3,4), this would be an easy method. The question is if the searching will go any faster. For example someone wants to see everything with tag1, you have to go trough the whole article table. This would be less tho then going trough the tag_to_article table. But then again, the tags_to_article table is slimmer. Another thing is you have to explode every time in php, i don't know if this takes time.
Saif Bechan
If you do both (tags w/ the article, and in separate table) then this gives you performance both for post-centric searches and for tag-centric searches. The tradeoff is the burden of maintaining the repeated information. Also, by limiting the number of tags, you can put each into its own column. Just Select * from articles Where XXXXX and go; no explode necessary.
John at CashCommons
+6  A: 

I believe you'll find interesting this blog post: Tags: Database schemas

Nick D
Nice post tank you.
Saif Bechan
A: 

If your database supports indexable arrays (like PostgreSQL, for example), I would recommend an entirely denormalized solution - store tags as an array of strings on the same table. If not, a secondary table mapping objects to tags is the best solution. If you need to store extra information against tags, you can use a separate tags table, but there's no point in introducing a second join for every tag lookup.

Nick Johnson
A: 

Your proposed three table implementation will work for tagging.

Stack overflow uses, however, different implementation. They store tags to varchar column in posts table in plain text and use full text indexing to fetch posts that match the tags. For example posts.tags = "algorithm system tagging best-practices". I am sure that Jeff has mentioned this somewhere but I forget where.

Juha Syrjälä