views:

87

answers:

4

I have a Questions model, and just like StackOverflow, each question can be tagged with multiple descriptive tags by a user.

What I'm trying to decide is whether it's necessary for the Tags associated with a question to be stored in a separate table in the database.

Or could I store the Tags as a single field of the Questions table as a list of space-separated strings?

I'm not sure which makes more sense - is there any good reason to separate the data?

+1  A: 

I would put the questions in 1 table, the tags in 1 table, and have a seperate table to connect the tags to questions. This would be the best way to build that database. It keeps all tags consistant and highly reduces redundency.

By seperating the data like this, your can assure that searching for a specific tag will bring back the same items. You don't have to worry about whether the tag is spelled the same throughout all the questions. Also, you can limit the tag options easier this way.

Justin Balvanz
A: 

You should definitely store the tags in a separate table, it makes everything easier, and that's the whole idea of a 'relational' database.

cloudhead
+3  A: 

Separating tags into their own table, plus a further table with a many:many relationship between Tags and Questions, is what's known in relational land ad "normal form". It makes it easier and faster to perform tasks such as getting all questions tagged with a certain tag, finding the most popular tags, &c.

(Just in case you don't know -- a "many:many relationship" is a table with just two columns [a foreign key into Tags and one into Questions] and no uniqueness constraints).

Alex Martelli
Ok, so your suggesting a Tags table and a TagsQuestions table. The Tags table would have content:string and tag_id:integer and TagsQuestions would have tag_id:integer(fk) and question_id:integer(fk).
Charlie K
yep, that's correct.
Alex Martelli
+3  A: 

Using a comma-separated string for a multi-valued attribute is another SQL Antipattern. :-)

  • How long does the string need to be? Stated another way: how many tags can a given entry have? (It depends on how long the individual tags are.)

  • How do you account for strings that contain the separator character? What if a character you currently use as a separator becomes a legitimate character in a tag?

  • How do you insert or delete elements from the list in SQL? (You have to fetch the whole list into the application, explode the list, filter through it, and re-post it to the database.)

  • How can you do aggregates like COUNT(*) in SQL?

  • How do you search efficiently for all entries that share a given tag? (You have to use costly pattern-matching queries.)

The solution is to use a separate table, as most other folks on this thread are advising.

Bill Karwin
Hey, it's you again :) Ok, you make a good point. On your second point about separator characters, though, I'm assuming there's some validation of the input received from the client side. Just as S.O. only allows you to enter letters and dashes for the tags here, I would do the same check in my front-end. So a separator would be a space character - it couldn't be anything else once it got to the database. Maybe I misunderstand your point.
Charlie K
The point is that eventually, someone will demand a tag that contains a space. If not a user, then a Marketing VP who can actually make such a demand. If you store one tag per row in a many-to-many table, you already support any arbitrary characters in the tags. If you store the list of tags as a space-separated string, then you have to update all existing rows, and also fix any code that reads or writes the tag lists.
Bill Karwin