views:

49

answers:

3

I have a table of tags

ID      tag
---     ------
1       tagt
2       tagb
3       tagz
4       tagn

In my items table, I'm using those tags in serialized format, comma delimited

ID       field1       tags
----     ------       -----
1        value1       tagt,tagb
2        value2       tagb
3        value3       tagb,tagn
4        value4

When I need to find items that have this tag, I plan to deserialize the tags. But I'm not actually sure how to do it, and if it's better to have a third table for associations instead between the tags and the items.

+2  A: 

Yes, if I were you, I would use a third table for the item-tag association. It would be easier and faster to search for tagged messages and you could also enforce that no same tag will be added twice to the same message with your database design.

shinkou
Re `enforce that no same tag will be added twice to the same message`, I'm thinking to deserialize the existing tags and compare them to the new tag being added to determine if the new should or should not be added.
donpal
@donpal Using a separate table like everyone is suggesting, you can add the UNIQUE or PRIMARY KEY constraint to both columns (as an index on both columns, not two indexes), and duplicates won't be added. No extra code is needed to double check if the tag is already in there.
Syntax Error
+2  A: 

In general, one tries to use the power of relational database to one's advantage. In other words, serializing and de-serializing a string every time any tagging feature is accessed is both incredibly inefficient and also a poor design for other reasons.

For example, what if an (undefined) tag tagc is accidentally added to the tags field of an item? The code would be very difficult to debug.

Creating a third table that contains records of associated IDs is a much better option. In addition, you could create the taggings table to look something like this:

ID    item_id    tag_id    tagger_id    etc
-------------------------------------------

and store meta-data along with each tagging.

Becoming over-dependent on strings for storing structured data is both inefficient and unmaintainable. To avoid "stringly typed" code, modeling the domain further is better in these situations.

nickname
+1  A: 

To explain what's wrong with your tagging approach, I'm going to show you a similarly flawed database:

table Students
--------------------
Id     grade          names
--     -----          -----
1      Kindergarden   timmy, james, sarah, suzie, etc...
2      1st            annie, chris, laura, etc...
3      2nd            robert, kimmy, sarah, jason, etc...

This is wrong for all of the same reasons. You can't do anything with the names. You can't count them, filter by them, or even return just one of them.

What if you want to cool stuff with your tags later that you're not thinking of right now?

Syntax Error