There's a healthy debate out there between surrogate and natural keys:
My opinion, which seems to be in line with the majority (it's a slim majority), is that you should use surrogate keys unless a natural key is completely obvious and guaranteed not to change. Then you should enforce uniqueness on the natural key. Which means surrogate keys almost all of the time.
Example of the two approaches, starting with a Company table:
1: Surrogate key: Table has an ID field which is the PK (and an identity). Company names are required to be unique by state, so there's a unique constraint there.
2: Natural key: Table uses CompanyName and State as the PK -- satisfies both the PK and uniqueness.
Let's say that the Company PK is used in 10 other tables. My hypothesis, with no numbers to back it up, is that the surrogate key approach would be much faster here.
The only convincing argument I've seen for natural key is for a many to many table that uses the two foreign keys as a natural key. I think in that case it makes sense. But you can get into trouble if you need to refactor; that's out of scope of this post I think.
Has anyone seen an article that compares performance differences on a set of tables that use surrogate keys vs. the same set of tables using natural keys? Looking around on SO and Google hasn't yielded anything worthwhile, just a lot of theorycrafting.
Important Update: I've started building a set of test tables that answer this question. It looks like this:
- PartNatural - parts table that uses the unique PartNumber as a PK
- PartSurrogate - parts table that uses an ID (int, identity) as PK and has a unique index on the PartNumber
- Plant - ID (int, identity) as PK
- Engineer - ID (int, identity) as PK
Every part is joined to a plant and every instance of a part at a plant is joined to an engineer. If anyone has an issue with this testbed, now's the time.