views:

199

answers:

2

from this post http://stackoverflow.com/questions/334183/what-is-the-most-efficient-way-to-store-tags-in-a-database

It was recommended to store tag tables like this.

Table: Item
Columns: ItemID, Title, Content

Table: Tag
Columns: TagID, Title

Table: ItemTag
Columns: ItemID, TagID

And another SO post said the same thing. Can anyone explain why tags should be stored like this? I am guessing ItemID is some internal val, title is the tag name (c++, sql, noob, etc) content is whatever else data i want to store with my item. why not something like

Table: Item
Columns: ItemID, Title, <more data i want>

Table: TagList
Columns: ItemID, Title

title in item being "item name" and tag title being "c++" "sql" "noob" "etc"

+1  A: 

Why? It's normalized. ItemID will be the primary key (perhaps a surrogate or identity), TagID will almost certainly be a surrogate/identity and for constraints/performance you'll have a unique constraint and/or index (perhaps even clustered on the tag.title).

In your example, TagList will not have a primary key (unless perhaps both columns will be the primary key), and it cannot be normalized to remove tag title redundancy (two items with the same tag will have separate entries). In addition, because tags aren't entities in their own right in the second model, they can't be linked to a different type of entity with another ItemTag table.

Cade Roux
+5  A: 

There's nothing wrong with the second design you show, the one with the TagList table, except that it takes more space.

That is, if you tag 10,000 items with the tag "database-design", then in the two-table design, you have to store that string 10,000 times. If space-efficiency is more important, you could use the three-table design, which would only store the 4-byte integer ID for "database-design" 10,000 times. A savings of 10 * 10,000 bytes.

Another difference is that in the three-table design, you could have more than one row in the Tag table with the same string, even though they have different integer ID values. So in the ItemTag table, they would appear to be different tags, and you'd never know that they're actually tagged similarly. Whereas in the two-table design, tags with the same spelling become grouped together implicitly.

Another point: if you have the need to change the spelling of tags, then in the two-table design you have to update many rows. In the three-table design, you only need to update a single row.

And finally, if you commonly need a list of unique tags, it's more speedy to query the Tags table in the three-table design, instead of needing a SELECT DISTINCT tag FROM TagList every time you need the unique list. And the latter only gives you a list of tags in use, not a list of all eligible tags.

Bill Karwin
That answer is pure gold :D
acidzombie24
Thanks! I guess the footnote would be, both ways are legitimate, each has their own advantages. Choose the design based on which parts you need to be more efficient. I.e. it depends how your particular application needs to use the data.
Bill Karwin