views:

30

answers:

2

I am building a system that stores articles and tags that categorize the article. Standard stuff, similar to how this website does it. Now my question is whether I should store the tags on a separate table that just contains tags and article ids or store the tags on an extra column in the articles table. My first instinct would be to normalize the database and have two tables. The problem is that the interface with which the user administers the tags is a simple text box with all tags separated by commas. So when the user commits his changes, in order to find out which tags where added, changed or subtracted, I would need to first query the database , compare the results with the new data on a tag basis and then process the changes accordingly. A process with a huge overhead, compared with simply updating the one filed in the one row of the articles table. How would you do it or is there a third option I haven’t considered?

PD. I am stuck with a relational database for this project .

+1  A: 

If you are using a separate table, rather than trying to figure out which tags have changed each time, simply delete all for the given article ID, and then insert all of the supplied tags - this should present very little overhead.

In a tagged system the performance that would normally be most important is the retrieval of tags and / or the retrieval of the related content. Using a separate table with an indexed tag column should provide very fast lookup in a situation where an item can have any number of tags.

Macros
I thought about it but I don't have a way to guaranty the atomicity of the operation, if something goes wrong after I delete the tags I am left with an uncategorized article. And besides I may like to track which tags are new and which removed.
Julio Garcia
This still shouldn't present much overhead, and in fact should be easier to work out if a separate table is used rather than string comparison in a single column. The key to this is to optimise for the retrieval of information as it will be the most common usage - even at the expense of additional overhead when inserting / updating
Macros
A: 

You need to normalize the database in order to run queries such as 'find all articles with tag T'.

I don't think that there will really be that much overhead in grabbing all of the tags to compare them with the new tags, assuming that you've applied correct indexes.

Personally I wouldn't delete all the tags then insert all the new ones, because I might want to do things like audit when individual tags are entered.

If you're using SQL Server 2008 then I suggest that you look at the MERGE command.

Yellowfog
I am using SQL Server Compact so no MERGE command. At least none that I could find.
Julio Garcia
I guess that you'll have to write separate insert, update and delete statements for each case then. Note that if you wrap them in a transaction then you get atomicity.
Yellowfog