http://weblogs.sqlteam.com/jeffs/archive/2008/08/13.aspx:
Consider the following logical data model:
* There are multiple Companies
* Each Company has many Projects
* Each Project has many Tasks
* Each Task has a Status, selected from a global list of pre-defined Statuses.Let us say that we decide that the primary key of Companies, Projects, Tasks, and Status are all Identity (auto-number) columns, since we would like to auto-generate primary keys for those tables.
So basically, we have 4 tables:
Status (PK: StatusID)
Companies (PK: CompanyID)
Projects (PK: ProjectID, FK: [Companies].CompanyID)
Tasks (PK: TaskID, FK: [Projects].ProjectID, [Status].StatusID).
Now, allow me to add one little wrinkle. Suppose that the available Statuses for each Task is not defined globally, but at the Company level. That is, each Company has its own list of Statuses to which a Task can be assigned.
This means that the Status table now needs a Foreign Key reference to the Companies table (to indicate which Company each Status belongs to):
Companies (PK: CompanyID)
Status (PK: StatusID, FK: [Companies].CompanyID)
Projects (PK: ProjectID, FK: [Companies].CompanyID)
Tasks (PK: TaskID, FK: [Projects].ProjectID, [Status].StatusID).
Are there any other changes we need to make to this data model? Or is simply adding a CompanyID column to the Status table enough to facilitate this change? Remember, our goal as always is full referential integrity using primary and foreign key constraints wherever possible.
Well, there is an issue:
Nothing in this data model stops us from assigning a Status to a Task that is not defined for that Task's parent company. We have no way of enforcing this right now with our current constraints. Our physical data model is flawed.
This is very easily fixed, but it can only be done by violating the "all tables just need an identity primary key" rule.
First, remember this: just because an identity column is unique does not mean that that column cannot be part of a primary key.
He goes on to indicate how composite keys can be used to fully enforce and constrain your data model with something like this:
Companies (PK: CompanyID)
Status (PK: CompanyID, StatusID, FK: [Companies].CompanyID)
Projects (PK: CompanyID, ProjectID, FK: [Companies].CompanyID)
Tasks (PK: TaskID, FK: [Projects].(CompanyID, ProjectID), [Status].(CompanyID, StatusID)).
I have long been a fan of fully enforcing/constraining my data models, however, often I find myself in situations similar to the aforementioned and I come to a cross-roads:
To fully enforce or to not fully enforce.
The obvious down-side to this is the seemingly over-complex design.
Now, I know that there isn't necessarily a "correct" design, but for situations like these.. I'm looking for feedback in terms of best practices.
Advantages, disadvantages and general thoughts on this design or fully enforcing your data model design?
**Note that this question may spawn a debate as to where responsibilities lie in terms of enforcing your data model (database or application or both). For sake of discussion, I believe that your data model should enforce itself - please answer under this presumption. **