views:

231

answers:

4

The pragmatists have won the argument of surrogate vs. natural primary keys in favor of surrogate keys*. In my own work I always use SQL Server identity columns without a second thought. But it occurs to me that, for a table to be in 1st normal form, I should be able to identify a natural key and enforce it with a unique constraint. I can't do this for all the tables in my database, so my database doesn't even meet the lowest criteria of normalization.

Do you agree that a table with a surrogate primary key must also have a unique constraint on a natural key in order to be in 1NF?

*I think Joe Celko is still fighting the good fight, see the last paragraph.

Edited to add: Thanks for the responses. My impression is that adding a unique constraint is not a common practice, so I'm somewhat surprised that the responses so far have been unanimous.

+3  A: 

Yes!

If the table is supposed to record at most instance of the natural key, you need a constraint on the relevant column(s) to enforce that. Otherwise, you can end up with a table of 50,000,000 rows, each with a different ID value but otherwise identical -- which is pathetic or ludicrous or a travesty, depending on your viewpoint.

Jonathan Leffler
+1  A: 

I could fill a book with all of the issues that IDENTITY columns have caused for my clients. On the bright side, it keeps me working. If everyone designed their databases the right way the first time I'd never have to come in and fix them :)

If you're going to use an IDENTITY for a surrogate key, you need to do exactly that. A surrogate key should be completely behind the scenes. Users should never see them on reports, they should never be able to look up rows with them, etc. In addition, you should have a natural key, which by definition must be unique. That uniqueness should be enforced in your data model.

My current client has decided that it's going to merge several entities in their database. They of course used IDENTITY columns everywhere and displayed them on all of the records on the front end. Now that the rows have to be merged there are collisions which means renumbering. Of course, since the users know these numbers when an entity gets renumbered it will completely throw the user for a loop.

Tom H.
I disagree with the "never see them on reports...never be able to look up rows with them." In our system, we use an identity column for the requisitions. The "Req ID" is the identity column value. Now, our purchase orders use an incrementing number that we control, but the Req ID has proved quite useful and poses no issues for us.
Nick DeVore
The fact that something worked in one situation and hasn't caused a problem *yet* doesn't mean that it's a good idea.
Tom H.
+2  A: 

If you have a natural key and you don't place a unique constraint on it, you are going to create problems in your data integrity. Unfortunately in the real world, not every table has a genuine natural key.

HLGEM
+1  A: 

Let me reframe the question. The interesting point isn't whether the table is or is not in 1NF. The interesting point is whether the natural data is in 1NF.

If you remove the ID column, and then wipe out the duplicates, you've performed what's called a "projection" in relational speak. The result is in 1NF by definition.

If you remove the ID column, but don't wipe out duplicates, you end up with a bag of rows, not a set. If the application somehow doesn't prevent duplicates from getting into that bag, then the bag is not in 1NF.

My answer boils down to the same as Jonathan's and Tom's. The net effect is that you've moved the defense of data integrity back out of the DBMS and back into the application. Application programmers tend to do this, in this case and in many others. If they code their applications right, then it's a matter of preference. If they don't, then they've regressed to the state of the art before databases were introduced.

Walter Mitty