How do websites like Digg, Del.icio.us, and StackOverflow implement tagging?
I know this other question has an accepted answer of a many-to-many relation with a cross ref table. But how do the "big boys" do it? The same way? How is it scaling?
How do websites like Digg, Del.icio.us, and StackOverflow implement tagging?
I know this other question has an accepted answer of a many-to-many relation with a cross ref table. But how do the "big boys" do it? The same way? How is it scaling?
Here is the oft-quoted article which breaks down tagging schemas by real performance metrics: http://www.pui.ch/phred/archives/2005/06/tagsystems-performance-tests.html
The author notes that the founder of delicious explains using an RDBMS for tagging simply does not scale to many millions of items under load. An alternative like Lucene may fit better in such a case.
I am sure that the additional JOIN queries would be too expensive in a very large system.
The tags are either stored non-normalised in the main table, or there may be a separate tag table which has a row for each tagged item.