views:

738

answers:

2

How do websites like Digg, Del.icio.us, and StackOverflow implement tagging?

I know this other question has an accepted answer of a many-to-many relation with a cross ref table. But how do the "big boys" do it? The same way? How is it scaling?

+8  A: 

Here is the oft-quoted article which breaks down tagging schemas by real performance metrics: http://www.pui.ch/phred/archives/2005/06/tagsystems-performance-tests.html

The author notes that the founder of delicious explains using an RDBMS for tagging simply does not scale to many millions of items under load. An alternative like Lucene may fit better in such a case.

Rex M
+1. Thanks. I was going to include a link to this article in my question but couldn't remember how to find it. So out of these different strategies- Any idea on what StackOverflow or Digg use?
tyndall
@Tyndall SO uses SQL, but SO does not have many millions (tens or hundreds) of items to be tagged in multiple dimensions like Digg or Delicious might.
Rex M
@Rex, Can you please suggest me any open source tools to test the database performance and time to rewrite the queries, if the queries is under performing. thanks in advance
harigm
A: 

I am sure that the additional JOIN queries would be too expensive in a very large system.

The tags are either stored non-normalised in the main table, or there may be a separate tag table which has a row for each tagged item.

Jon Winstanley