views:

118

answers:

5

Possible Duplicate:
Surrogate Vs. Natural/Business Keys

The title says it all.

If given two tables, Level1 [ID, Title, Number] Level2 [ID, FKID???, Title, Number]

A user of the system understands that Level2 is related to Level1 with based on Level1's Number, my question to you is, would you make the relationship based on the internal ID and "emulate" the relationship with "Number" or would you simply use the "Number" field and be done with it?

+2  A: 

The two standard reasons for relating on database ID rather than natural ID are:

  1. It's difficult to guarantee that the natural ID will never change under any circumstance
  2. Generally a natural ID takes more space and is not as efficient to index as the database ID (though of course this is not hard and fast - it depends on the data constituting your natural ID). By using the database ID, you can avoid duplication of business data.
danben
+1 exactly! Natural keys are hardly **ever** really guaranteed to be unique and immutable
marc_s
About your second point, "avoid duplication of business data" - to ensure business data is not duplicated we will need to do a search - and that search cannot be on the surrogate-key, because in the user's world (i.e. the end-user's world) - its the natural key that exists.. In many systems, the surrogate key/sequence generated is kept entirely hidden from the user.Bottom line a lookup on the natural key is required even in those cases. I am curious to see what you think of that trade-off.
blispr
+1  A: 

This discussion has been beaten to death around here. Few people defend natural keys only and most encourage surrogate key usage. Some say you can have both, which is true, of course (use natural keys so business rules are enforced.)

I tend to think twice if the natural key is really a key (most of the time it isn't.). If it is, then I use it.

But again, most natural keys in reality aren't keys and will have duplicates due to various reasons. For example, in countries where national ID cards are issued, there is not uncommon to find two people with the same id number.

That said, if I'd go the surrogate key route, I'd set up the tables like this

Level1 [ID, Title, Number] Level2 [ID, FKID references Level1]

No need to store the title and number twice.

Vinko Vrsalovic
A: 

I think this has been covered many times.

It is almost always safer to use an IDENTITY/AUTONUMBER as a foreign key.

Most natural keys can be duplicated (even if in error).

In my country we even have issues with Personal ID numbers of people X-).

astander
A: 

If the 'Number' column can be used to correlate the two tables in a natural way, I would not force the relationship based on an internal (surrogate) key.

On one hand, the 'Number' column might be good enough, so you wouldn't waste space with the ID column. On the other hand, if the 'Number' column does not have the right characteristics (might change, might be hard to index, etc), then the surrogate ID might be a better option.

It all boils down to the semantics of the 'Number' column.

Salo
A: 
  • Sometimes its best to use a natural key if (for example SSN , or employee ID) you are assured of key stability.

  • With surrogate keys, an FK is essentially a relationship from one system generated #, to another system generated #. In an extreme case, such an artificial relationship may be completely out of sync with reality.

  • Other posters have commented on the mess that happens when a natural key has to change. Sure thats a problem. But its worth examining how frequently these exceptions will happen for you - should the tail (exceptions) wag the dog (your design) - in every case?

  • I am not fanatic about this, but I do agree its inconvenient to have differing conventions within the same db with a mix of natural and surrogate keys.

  • If the Natural key is multiple fields , say DateOfBirth + LastName + Department (or some such wierd thing), then you are better off using Artificial keys.

All the above arguments are at the logical level of a data model. At the physical level your situation, servers, and performance constraints will dictate your decisions. Its best not to prematurely optimize.

blispr
but even an SSN is not guaranteed to be unique..... how do you want to use that as a primary key then???
marc_s
If an SSN # is not unique, there is no logical basis for using it as a key.. Perhaps that was a bad example. Instead, consider a VIN # on a car.. Would you use the VIN as the PK, or would you create a new ID column?
blispr