ansaurus

Question

Answer 1

+3 A:

A separate tag table is really the only way to go here. It is THE only way to allow an infinite number of tags.

Nico Burns 2009-07-17 22:59:33

Answer 2

+6 A:

Wanting to do this without joins strikes me as a premature optimisation. If this table is being accessed frequently, its pages are very likely to be in memory and you won't incur an I/O penalty reading from it, and the plans for the queries accessing it are likely to be cached.

Ken Keenan 2009-07-17 23:15:36

Proper indexing would also contribute in reducing any penalty from a join.

Corey Sunwold 2009-07-17 23:20:17

Answer 3

A:

If you're using SQL Server, you could use a single text field (varchar(max) seems appropriate) and full-text indexing. Then just do a full-text search for the tag you're looking for.

David Archer 2009-07-17 23:29:06

Answer 4

+2 A:

This sounds like an exercise in denormalization. All that's really needed is a table that can naturally support any query you happen to have, by repeating any information you would otherwise have to join to another table to satisfy. A normalized database for something like what you've got might look like:

Posts:
PostID  | PostTitle    | PostBody          | PostAuthor
--------+--------------+-------------------+-------------
1146044 | Join-Free... | I'm working on... | Michael Stum

Tags:
TagID | TagName
------+-------------
1     | Archetecture

PostTags:
PostID  | TagID
--------+------
1146044 | 1

Then You can add a columns to optimise your queries. If it were me, I'd probably just leave the Posts and Tags tables alone, and add extra info to the PostTags join table. Of course what I add might depend a bit on the queries I intend to run, but probably I'd at least add Posts.PostTitle, Posts.PostAuthor, and Tags.TagName, so that I need only run two queries for showing a blog post,

SELECT * FROM `Posts` WHERE `Posts`.`PostID` = $1 
SELECT * FROM `PostTags` WHERE `PostTags`.`PostID` = $1

And summarizing all the posts for a given tag requires even less,

SELECT * FROM `PostTags` WHERE `PostTags`.`TagName` = $1

Obviously the downside to denormalization is that it means you have to do a bit more work to keep the denormalized tables up to date. A typical way of dealing with this is to put some sanity checks in your code that detects when a denormalized query is out of sync by comparing it to other information it happens to have available. Such a check might go in the above example by comparing the post titles in the PostTags result set against the title in the Posts result. This doesn't cause an extra query. If there's a mismatch, the program could notify an admin, ie by logging the inconsistency or sending an email.

Fixing it is easy (but costly in terms of server workload), throw out the extra columns and regenerate them from the normalized tables. Obviously you shouldn't do this until you have found the cause of the database going out of sync.

TokenMacGuy 2009-07-17 23:37:48

ansaurus

tags:

views:

answers:

Join-Free Table structure for Tags

related questions