Using SO as an example, what is the most sensible way to manage tags if you anticipate they will change often?
Way 1: Seriously denormalized (comma delimited)
table posts +--------+-----------------+ | postId | tags | +--------+-----------------+ | 1 | c++,search,code |
Here tags are comma delimited.
Pros: Tags are retrieved at once with a single select
query. Updating tags is simple. Easy and cheap to update.
Cons: Extra parsing on tag retrieval, difficult to count how many posts use which tags.
(alternatively, if limited to something like 5 tags)
table posts +--------+-------+-------+-------+-------+-------+ | postId | tag_1 | tag_2 | tag_3 | tag_4 | tag_5 | +--------+-------+-------+-------+-------+-------+ | 1 | c++ |search | code | | |
Way 2: "Slightly normalized" (separate table, no intersection)
table posts +--------+-------------------+ | postId | title | +--------+-------------------+ | 1 | How do u tag? | table taggings +--------+---------+ | postId | tagName | +--------+---------+ | 1 | C++ | | 1 | search |
Pros: Easy to see tag counts (count(*) from taggings where tagName='C++'
).
Cons: tagName will likely be repeated many, many times.
Way 3: The cool kid's (normalized with intersection table)
table posts +--------+---------------------------------------+ | postId | title | +--------+---------------------------------------+ | 1 | Why is a raven like a writing desk? | table tags +--------+---------+ | tagId | tagName | +--------+---------+ | 1 | C++ | | 2 | search | | 3 | foofle | table taggings +--------+---------+ | postId | tagId | +--------+---------+ | 1 | 1 | | 1 | 2 | | 1 | 3 |
Pros:
- No repeating tag names.
- More girls will like you.
Cons: More expensive to change tags than way #1.