To Fully Enforce Your Data Model or To Not Fully Enforce Your Data Model

http://weblogs.sqlteam.com/jeffs/archive/2008/08/13.aspx:

Consider the following logical data model:
* There are multiple Companies
* Each Company has many Projects
* Each Project has many Tasks
* Each Task has a Status, selected from a global list of pre-defined Statuses.

Let us say that we decide that the primary key of Companies, Projects, Tasks, and Status are all Identity (auto-number) columns, since we would like to auto-generate primary keys for those tables.

So basically, we have 4 tables:
Status (PK: StatusID)
Companies (PK: CompanyID)
Projects (PK: ProjectID, FK: [Companies].CompanyID)
Tasks (PK: TaskID, FK: [Projects].ProjectID, [Status].StatusID).

Now, allow me to add one little wrinkle. Suppose that the available Statuses for each Task is not defined globally, but at the Company level. That is, each Company has its own list of Statuses to which a Task can be assigned.

This means that the Status table now needs a Foreign Key reference to the Companies table (to indicate which Company each Status belongs to):

Companies (PK: CompanyID)
Status (PK: StatusID, FK: [Companies].CompanyID)
Projects (PK: ProjectID, FK: [Companies].CompanyID)
Tasks (PK: TaskID, FK: [Projects].ProjectID, [Status].StatusID).

Are there any other changes we need to make to this data model? Or is simply adding a CompanyID column to the Status table enough to facilitate this change? Remember, our goal as always is full referential integrity using primary and foreign key constraints wherever possible.

Well, there is an issue:

Nothing in this data model stops us from assigning a Status to a Task that is not defined for that Task's parent company. We have no way of enforcing this right now with our current constraints. Our physical data model is flawed.

This is very easily fixed, but it can only be done by violating the "all tables just need an identity primary key" rule.

First, remember this: just because an identity column is unique does not mean that that column cannot be part of a primary key.

He goes on to indicate how composite keys can be used to fully enforce and constrain your data model with something like this:

Companies (PK: CompanyID)
Status (PK: CompanyID, StatusID, FK: [Companies].CompanyID)
Projects (PK: CompanyID, ProjectID, FK: [Companies].CompanyID)
Tasks (PK: TaskID, FK: [Projects].(CompanyID, ProjectID), [Status].(CompanyID, StatusID)).

I have long been a fan of fully enforcing/constraining my data models, however, often I find myself in situations similar to the aforementioned and I come to a cross-roads:

To fully enforce or to not fully enforce.

The obvious down-side to this is the seemingly over-complex design.

Now, I know that there isn't necessarily a "correct" design, but for situations like these.. I'm looking for feedback in terms of best practices.

Advantages, disadvantages and general thoughts on this design or fully enforcing your data model design?

**Note that this question may spawn a debate as to where responsibilities lie in terms of enforcing your data model (database or application or both). For sake of discussion, I believe that your data model should enforce itself - please answer under this presumption. **

I would create a CompanyStatus table which is a many to many between Company and Status and describes which of the statuses are applicable for a given company. Then, tasks get assigned a CompanyStatusID rather than a StatusID.

This also prevents you from having duplicate statuses in your Status table - many companies can share the same Closed status for example, which is better normalization.

So, you do not need to use compound keys in order to properly enforce constraints. I prefer to use single autoincrement primary keys that have no meaning (surrogate keys). That is more robust than making an assumption that a key will be unique (such as SSN), when there is always the chance that will turn out not to be the case, yet you must store the data anyway since the application requires it (so a unique constraint is of no help here).

this is the correct/normalized design, not the one listed in the question, though using StatusId in the Task table would be ok if not better (CompanyStatus just used to filter the initial assignment)

Steven A. Lowe 2009-07-17 19:32:49

I would not do the same with Projects unless more than one Company can be associated with the same Project, and I did not get that from the requirements. I do understand the underlying premise, that you can use composite keys for better integrity than you can get with surrogates. That is something to be considered when designing your model; personally, I have yet to find an occasion where I decided to go that route - I suspect I would be more likely to get errors from mistyping lengthy composite keys in my queries than I would get through the loss of constraints surrogate keys may give you.

RedFilter 2009-07-17 20:00:22

Adding a CompanyStatus table would allow you to have a task with a project from one company and a status from another company. Besides the point though, I'm really after thoughts on best practices in terms of fully enforcing your data model.

Jeremiah 2009-07-17 21:53:15

ansaurus

tags:

views:

answers:

To Fully Enforce Your Data Model or To Not Fully Enforce Your Data Model

related questions