views:

225

answers:

10

Just trying to figure out the best way to design my table for the following scenario:

I have several areas in my system (documents, projects, groups and clients) and each of these can have comments logged against them.

My question is should I have one table like this:

CommentID
DocumentID
ProjectID
GroupID
ClientID
etc

Where only one of the ids will have data and the rest will be NULL or should I have a separate CommentType table and have my comments table like this:

CommentID
CommentTypeID
ResourceID (this being the id of the project/doc/client)
etc

My thoughts are that option 2 would be more efficient from an indexing point of view. Is this correct?

A: 

Of the options you give, I would go for number 2.

Galwegian
+3  A: 

From a foreign key perspective, the first example is better because you can have multiple foreign key constraints on a column but the data has to exist in all those references. It's also more flexible if the business rules change.

OMG Ponies
+1... Good point on the foreign keys problem.
Daniel Vassallo
Not only is it a problem enforcing foreign keys but to get all data back out, you have to join to the same table four times. May as well have four separate comments tables that are specialized as use that because at least you can use the FK constraints then. EAV tables in general don't scale well.
HLGEM
+4  A: 

Read up on database normalization.

Nulls in the way you describe would be a big indication that the database isn't designed properly.

You need to split up all your tables so that the data held in them is fully normalized, this will save you a lot of time further down the line guaranteed, and it's a lot better practice to get into the habit of.

Tom Gullen
Pretty sure I did spell 'normalisation' correctly, I am in the UK.
Tom Gullen
Sorry :) I was correcting a couple of the typos and my spell-checker highlighted them. I thought that might be the case, but I figured maybe I'd help you Brits see the err of your ways ;)
Tom H.
+3  A: 

To continue from @OMG Ponies' answer, what you describe in the second example is called a Polymorphic Association, where the foreign key ResourceID may reference rows in more than one table. However in SQL databases, a foreign key constraint can only reference exactly one table. The database cannot enforce the foreign key according to the value in CommentTypeID.

You may be interested in checking out the following Stack Overflow post for one solution to tackle this problem:

Daniel Vassallo
+2  A: 

The first approach is not great, since it is quite denormalized. Each time you add a new entity type, you need to update the table. You may be better off making this an attribute of document - I.e. store the comment inline in the document table.

For the ResourceID approach to work with referential integrity, you will need to have a Resource table, and a ResourceID foreign key in all of your Document, Project etc.. entities (or use a mapping table.) Making "ResourceID" a jack-of-all-trades, that can be a documentID, projectID etc.. is not a good solution since it cannot be used for sensible indexing or foreign key constraint.

To normalize, you need to the comment table into one table per resource type.

Comment
-------
CommentID
CommentText
...etc 

DocumentComment
---------------
DocumentID
CommentID

ProjectComment
--------------
ProjectID
CommentID

If only one comment is allowed, then you add a unique constraint on the foreign key for the entity (DocumentID, ProjectID etc.) This ensures that there can only be one row for the given item and so only one comment. You can also ensure that comments are not shared by using a unique constraint on CommentID.

EDIT: Interestingly, this is almost parallel to the normalized implementation of ResourceID - replace "Comment" in the table name, with "Resource" and change "CommentID" to "ResourceID" and you have the structure needed to associate a ResourceID with each resource. You can then use a single table "ResourceComment".

If there are going to be other entities that are associated with any type of resource (e.g. audit details, access rights, etc..), then using the resource mapping tables is the way to go, since it will allow you to add normalized comments and any other resource related entities.

mdma
This doesn't prevent sharing of comments between differing entity types though (if that's a requirement). A comment ID could be in both DocumentComments and ProjectComments
Tom H.
@Tom - That's correct, and as far as I know, there is no relational way to enforce this other than by using triggers to check after modification, or if available, a materialized view that UNIONs all resourceIDs from the Document, Project etc. tables and applies a uniqueness constriant.
mdma
Someone had posted a link (looks like they removed it) with a description of how to super type the tables. I don't think it would be appropriate in this case, but something like that would allow for the enforcement we're talking about.
Tom H.
@Tom, I think I had the link originally ( [this one](http://consultingblogs.emc.com/davidportas/archive/2007/01/08/Distributed-Keys-and-Disjoint-Subtypes.aspx) ). I removed it because it didn't exactly describe the same situation. In addition, @Bill Karwin described it better [in the other answer](http://stackoverflow.com/questions/3055229/generic-database-table-design/3055530#3055530).
Daniel Vassallo
A: 

Option 2 is a good way to go. The issue that I see with that is you are putting the resouce key on that table. Each of the IDs from the different resources could be duplicated. When you join resources to the comments you will more than likely come up with comments that do not belong to that particular resouce. This would be considered a many to many join. I would think a better option would be to have your resource tables, the comments table, and then tables that cross reference the resource type and the comments table.

edgel1k
My queries on this table will always be something like:SELECT * FROM tblCommentsWHERE CommentTypeID = 1 AND ResourceID = 1244
Gazeth
A: 

If you carry the same sort of data about all comments regardless of what they are comments about, I'd vote against creating multiple comment tables. Maybe a comment is just "thing it's about" and text, but if you don't have other data now, it's likely you will: date the comment was entered, user id of person who made it, etc. With multiple tables, you have to repeat all these column definitions for each table.

As noted, using a single reference field means that you could not put a foreign key constraint on it. This is too bad, but it doesn't break anything, it just means you have to do the validation with a trigger or in code. More seriously, joins get difficult. You can just say "from comment join document using (documentid)". You need a complex join based on the value of the type field.

So while the multiple pointer fields is ugly, I tend to think that's the right way to go. I know some db people say there should never be a null field in a table, that you should always break it off into another table to prevent that from happening, but I fail to see any real advantage to following this rule.

Personally I'd be open to hearing further discussion on pros and cons.

Jay
+3  A: 

Option 2 is not a good solution for a relational database. It's called polymorphic associations (as mentioned by @Daniel Vassallo) and it breaks the fundamental definition of a relation.

For example, suppose you have a ResourceId of 1234 on two different rows. Do these represent the same resource? It depends on whether the CommentTypeId is the same on these two rows. This violates the concept of a type in a relation. See SQL and Relational Theory by C. J. Date for more details.

Another clue that it's a broken design is that you can't declare a foreign key constraint for ResourceId, because it could point to any of several tables. If you try to enforce referential integrity using triggers or something, you find yourself rewriting the trigger every time you add a new type of commentable resource.

I would solve this with the solution that @mdma briefly mentions (but then ignores):

CREATE TABLE Commentable (
  ResourceId INT NOT NULL IDENTITY,
  ResourceType INT NOT NULL,
  PRIMARY KEY (ResourceId, ResourceType)
);

CREATE TABLE Documents (
  ResourceId INT NOT NULL,
  ResourceType INT NOT NULL CHECK (ResourceType = 1),
  FOREIGN KEY (ResourceId, ResourceType) REFERENCES Commentable
);

CREATE TABLE Projects (
  ResourceId INT NOT NULL,
  ResourceType INT NOT NULL CHECK (ResourceType = 2),
  FOREIGN KEY (ResourceId, ResourceType) REFERENCES Commentable
);

Now each resource type has its own table, but the serial primary key is allocated uniquely by Commentable. A given primary key value can be used only by one resource type.

CREATE TABLE Comments (
  CommentId INT IDENTITY PRIMARY KEY,
  ResourceId INT NOT NULL,
  ResourceType INT NOT NULL,
  FOREIGN KEY (ResourceId, ResourceType) REFERENCES Commentable
);

Now Comments reference Commentable resources, with referential integrity enforced. A given comment can reference only one resource type. There's no possibility of anomalies or conflicting resource ids.

I cover more about polymorphic associations in my presentation Practical Object-Oriented Models in SQL and my book SQL Antipatterns.

Bill Karwin
Kudos for SQL Antipatterns, Bill :) That is a great piece of work. I got the ebook last week, and currently reading one chapter a day. My bookmark is on the polymorphic associations chapter incidentally :) ... congrats!
Daniel Vassallo
+1  A: 

I wouldn't go with either of those solutions. Depending on some of the specifics of your requirements you could go with a super-type table:

CREATE TABLE Commentable_Items (
    commentable_item_id    INT    NOT NULL,
    CONSTRAINT PK_Commentable_Items PRIMARY KEY CLUSTERED (commentable_item_id))
GO
CREATE TABLE Projects (
    commentable_item_id    INT    NOT NULL,
    ... (other project columns)
    CONSTRAINT PK_Projects PRIMARY KEY CLUSTERED (commentable_item_id))
GO
CREATE TABLE Documents (
    commentable_item_id    INT    NOT NULL,
    ... (other document columns)
    CONSTRAINT PK_Documents PRIMARY KEY CLUSTERED (commentable_item_id))
GO

If the each item can only have one comment and comments are not shared (i.e. a comment can only belong to one entity) then you could just put the comments in the Commentable_Items table. Otherwise you could link the comments off of that table with a foreign key.

I don't like this approach very much in your specific case though, because "having comments" isn't enough to put items together like that in my mind.

I would probably go with separate Comments tables (assuming that you can have multiple comments per item - otherwise just put them in your base tables). If a comment can be shared between multiple entity types (i.e., a document and a project can share the same comment) then have a central Comments table and multiple entity-comment relationship tables:

CREATE TABLE Comments (
    comment_id    INT            NOT NULL,
    comment_text  NVARCHAR(MAX)  NOT NULL,
    CONSTRAINT PK_Comments PRIMARY KEY CLUSTERED (comment_id))
GO
CREATE TABLE Document_Comments (
    document_id    INT    NOT NULL,
    comment_id     INT    NOT NULL,
    CONSTRAINT PK_Document_Comments PRIMARY KEY CLUSTERED (document_id, comment_id))
GO
CREATE TABLE Project_Comments (
    project_id     INT    NOT NULL,
    comment_id     INT    NOT NULL,
    CONSTRAINT PK_Project_Comments PRIMARY KEY CLUSTERED (project_id, comment_id))
GO

If you want to constrain comments to a single document (for example) then you could add a unique index (or change the primary key) on the comment_id within that linking table.

It's all of these "little" decisions that will affect the specific PKs and FKs. I like this approach because each table is clear on what it is. In databases that's usually better then having "generic" tables/solutions.

Tom H.
A: 

Pawnshop Application:

I have separate tables for Loan, Purchase, Inventory & Sales transactions. Each tables rows are joined to their respective customer rows by:

customer.pk [serial] = loan.fk [integer];
                     = purchase.fk [integer];
                     = inventory.fk [integer];
                     = sale.fk [integer]; 

I have consolidated the four tables into one table called "transaction", where a column:

transaction.trx_type char(1) {L=Loan, P=Purchase, I=Inventory, S=Sale}

Scenario:

A customer initially pawns merchandise, makes a couple of interest payments, then decides he wants to sell the merchandise to the pawnshop, who then places merchandise in Inventory and eventually sells it to another customer.

I designed a generic transaction table where for example:

transaction.main_amount DECIMAL(7,2)

in a loan transaction holds the pawn amount, in a purchase holds the purchase price, in inventory and sale holds sale price.

This is clearly a denormalized design, but has made programming alot easier and improved performance. Any type of transaction can now be performed from within one screen, without the need to change to different tables.

Frank Computer