views:

193

answers:

3

Scenario

I am building a database that contains a series of different tables. These consist of a COMMENTS table, a BLOGS table & an ARTICLES table. I want to be able to add new items to each table, and tag them with between 0 and 5 tags to help the user search for particular information that is relevant more easily.

Initial thoughts for architecture

My first thoughts were to have a centralised table of TAGS. This table would list all of the available tags using a TagID field & a TagName field. Since each item can have many tags and each tag can have many items, I would need a MANY-TO-MANY relationship between each item table and the TAGS table.

For Example:

Many COMMENTS can have many TAGS. Many TAGS can have many COMMENTS.

Many ARTICLES can have many TAGS. Many TAGS can have many ARTICLES.

etc.....

Current Understanding

From previous experience I understand that a way of implementing this structure in T-SQL is to have an ajoining table between the COMMENTS table and the TAG table. This ajoining table would contain the CommentID & the TagID, as well as its own unique CommentTagID. This structure would also apply to all other items.

Questions

Firstly is this the right way to go about implementing such a database architecture? If not, what other methods would be feasible? Since the database will eventually contain a lot of information, I need to ensure that it is scalable. Is this a scalable implementation? If I had lots of these tables would this architecture make CRUD operations very slow? Should I use GUIDs or Incrementing INTs for the ID fields?

Help & suggestions would be appreciated greatly.

Thankyou.

A: 

usually many-to-many relationship implemented exactly as you describe it.

Auto-incrementing IDs it is good idea since it guarantee that they will be unique.

And you can use guids if you want to tag comments and articles with the same tag(instead of 6 tables you need just 5). But searching with guids may be more slow.

Trickster
A: 

Keeping a centralized table of tags is a good idea if you will ever need to do one of the following:

  1. Build a complete list of all tags (that is mixing blog tags, comment tags and article tags)
  2. Update the tags so that they get updated everywhere: so that when you change sqlserver to sql-server, it gets changed anywhere: in blogs, articles and comments.

Option 1 is very useful to build the tag clouds so I'd recommend to build a table of tags and reference it from your tables.

If you won't ever need to update the tags as described in the option 2, you don't ever need surrogate key for them.

You will most probably need a UNIQUE constraint on them anyway and there is no point not to make it a PRIMARY KEY, if you are not going to update them.

This will also save you lots of joins: you don't need to join with the tags table to show the tags.

GUIDs are more simple to manage, but theу make the indexes and link tables quite large in size.

You can assign a numerical identifier to each table and link like this:

tTag (tag VARCHAR(30) NOT NULL PRIMARY KEY)

tTaggable (type INT NOT NULL, id INT NOT NULL, PRIMARY KEY (type, id))

tTagLink (
        tag VARCHAR(30) NOT NULL FOREIGN KEY REFERENCES tTag,
        type INT NOT NULL, id INT NOT NULL,
        PRIMARY KEY (tag, type, id),
        FOREIGN KEY (type, id) REFERENCES tTaggable
        )

tBlog (
        id INT NOT NULL PRIMARY KEY,
        type INT NOT NULL, CHECK(type = 1),
        FOREIGN KEY (type, id) REFERENCES tTaggable,
        …)

tArticle (
        id INT NOT NULL,
        blog INT NOT NULL FOREIGN KEY REFERENCES tBlog,
        type INT NOT NULL, CHECK(type = 2),
        FOREIGN KEY (type, id) REFERENCES tTaggable,
        …)


tComment (
        id INT NOT NULL PRIMARY KEY,
        article INT NOT NULL FOREIGN KEY REFERENCES tArticle,
        type INT NOT NULL, CHECK(type = 3),
        FOREIGN KEY (type, id) REFERENCES tTaggable,
        …)

Note that if you want to delete a blog, an article or a comment, you should delete from tTaggable as well.

This way, tTaggable is only used to ensure the referential integrity. To query all tags for an article, you just issue this query:

SELECT  tag
FROM    tTagLink
WHERE   type = 2
        AND id = 1234567

, so you get all tags by querying a single table, without any joins.

Quassnoi
+1  A: 

You may also want to look at WordPress schema and database description to see how others are solving a similar problem.

Damir Sudarevic
cheers, saw "wordpress" and got insulted for a minute there......till I realized what it was.
Goober