views:

2418

answers:

16

Here we go again, the old argument still arises...

Would we better have a business key as a primary key, or would we rather have a surrogate id (i.e. an SQL Server identity) with a unique constraint on the business key field?

Please, provide examples or proof to support your theory.

+5  A: 

Alway use a key that has no business meaning. It's just good practice.

EDIT: I was trying to find a link to it online, but I couldn't. However in 'Patterns of Enterprise Archtecture' [Fowler] it has a good explanation of why you shouldn't use anything other than a key with no meaning other than being a key. It boils down to the fact that it should have one job and one job only.

IainMH
Martin Fowler may be many things, but he isn't an authority on database design.
Tony Andrews
+6  A: 

Surrogate key will NEVER have a reason to change. I cannot say the same about the natural keys. Last names, emails, ISBN nubmers - they all can change one day.

Rimantas
+1  A: 

On a datawarehouse scenario I belive is better to follow the surrogate key path. Two reasons:

  • You are independent of the source system, and changes there --such as a data type change-- won't affect you.
  • Your DW will need less phisical space since you will use only integer data types for your surrogate keys. Also your indexes will work better.
santiiiii
+2  A: 

Using a surrogate key is better in my opinion as there is zero chance of it changing. Almost anything I can think of which you might use as a natural key could change (disclaimer: not always true, but commonly).

An example might be a DB of cars - on first glance, you might think that the licence plate could be used as the key. But these could be changed so that'd be a bad idea. You wouldnt really want to find that out after releasing the app, when someone comes to you wanting to know why they can't change their number plate to their shiny new personalised one.

Splash
Unfortunately cars do have a natural key that doesn't change: the VIN (at least in America...)
jcollum
@jcollum Yes ok, that is a fair point. My opinion still stands though, my example was not necessarily as good as it could be.
Splash
+8  A: 

Surrogate keys (typically integers) have the added-value of making your table relations faster, and more economic in storage and update speed (even better, foreign keys do not need to be updated when using surrogate keys, in contrast with business key fields, that do change now and then).

A table's primary key should be used for identifying uniquely the row, mainly for join purposes. Think a Persons table: names can change, and they're not guaranteed unique.

Think Companies: you're a happy Merkin company doing business with other companies in Merkia. You are clever enough not to use the company name as the primary key, so you use Merkia's government's unique company ID in its entirety of 10 alphanumeric characters. Then Merkia changes the company IDs because they thought it would be a good idea. It's ok, you use your db engine's cascaded updates feature, for a change that shouldn't involve you in the first place. Later on, your business expands, and now you work with a company in Freedonia. Freedonian company id are up to 16 characters. You need to enlarge the company id primary key (also the foreign key fields in Orders, Issues, MoneyTransfers etc), adding a Country field in the primary key (also in the foreign keys). Ouch! Civil war in Freedonia, it's split in three countries. The country name of your associate should be changed to the new one; cascaded updates to the rescue. BTW, what's your primary key? (Country, CompanyID) or (CompanyID, Country)? The latter helps joins, the former avoids another index (or perhaps many, should you want your Orders grouped by country too).

All these are not proof, but an indication that a surrogate key to uniquely identify a row for all uses, including join operations, is preferable to a business key.

ΤΖΩΤΖΙΟΥ
You win all the internets with the coolest looking username!
IainMH
If my mother-in-law read my post, she would think: "he didn't say he is pro-business keys, therefore he is totally against unique business keys, so he shouldn't marry my daughter!"; but she won't read it. I believe I was downvoted because people didn't agree with me, not because it was unhelpful.
ΤΖΩΤΖΙΟΥ
That's pretty much what a downvote is: "I don't agree with this."
jcollum
The tooltip of the down arrow says "This answer is not useful", not "I don't agree with this". Perhaps in this specific answer the meanings are close, but they are not generally the same.
ΤΖΩΤΖΙΟΥ
@jcollum: I believe you never read my previous comment.
ΤΖΩΤΖΙΟΥ
+16  A: 

Just a few reasons for maintaining surrogate keys:

1) Stability: Changing a key because of a business or natural need will negatively affect related tables. Surrogate keys rarely, if ever, need to be changed because there is nothing tied to the value.

2) Convention: Allows you to have a standardized Primary Key column naming convention rather than having to think about how to join tables with various names for their PKs.

3) Speed: Depending on the PK value and type, a surrogate key of an integer may be smaller, faster to index and search.

Jay Shepherd
+1  A: 

Short answer: use surrogate key whenever possible.

Reasons:

  1. you can often tell something about a business key just by looking at it. This can be a good thing. This does not beat point 2.

  2. business keys can be changed, or reused (think company merger and acquisition, if at all in doubt).

  3. If it's warehouse or business intelligence (BI) data, surrogate keys because Mr Kimball said so.

Mark Regensberg
why "whenever possible". Do you have situations in mind where it's not possible?
Brann
technically, a surrogate key should always be possible. I guess the "whenever" is because there is no accounting for the human factor...
Mark Regensberg
+2  A: 

This is one of those cases where a surrogate kay pretty much always makes sense. There are cases where you either chose what's best for the database or what's best for your object model, but in both cases, using a meaningless key or Guid is a better idea. It makes indexing easier and faster, and it is an idenity for your object that doesn't change.

Charles Graham
+3  A: 

Surrogate keys are quite handy if you plan to use an ORM tool to handle/generate your data classes. While you can use composite keys with some of the more advanced mappers (read: hibernate), it adds some complexity to your code.

(Of course, database purists will argue that even the notion of a surrogate key is an abomination.)

I'm a fan of using uids for surrogate keys when suitable. The major win with them is that you know the key in advance e.g. you can create an instance of a class with the ID already set and guaranteed to be unique whereas with, say, an integer key you'll need to default to 0 or -1 and update to an appropriate value when you save/update.

UIDs have penalties in terms of lookup and join speed though so it depends on the application in question as to whether they're desirable.

Derek Lawless
+2  A: 

Always use a single column, surrogate key if at all possible. This makes joins as well as inserts/updates/deletes much cleaner because you're only responsible for tracking a single piece of information to maintain the record.

Then, as needed, stack your business keys as unique contraints or indexes. This will keep you data integrity intact.

Business logic/natural keys can change, but the phisical key of a table should NEVER change.

+17  A: 

Both. Have your cake and eat it.

Remember there is nothing special about a primary key, except that it is labelled as such. It is nothing more than a NOT NULL UNIQUE constraint, and a table can have more than one.

If you use a surrogate key, you still want a business key to ensure uniqueness according to the business rules.

Ted
If you have multiple "candidate" keys (fields or same-size collections of fields that are NOT NULL UNIQUE) then you are likely in violation of Boyce-Codd Normal Form. BCNF is beyond 3NF, so not many people worry about it. There are situations, however, where being in BCNF is very helpful.
Alan
Exactly, Ted! Thank God there are some real database people here. One thing that annoys the heck out of me is when someone creates a table with a surrogate key but no business key, even when one is staring them in the face!
Gregory Higley
A surrogate key is extremely useful for dealing with *single*-column relations and for applications which need to deal with related tables. A surrogate key in a common format is, again, useful for such things. But of course this doesn't mean eliminate business constraints.
Justice
Agreed. The real question should be: Should I add a unique surrogate key to my tables? An entirely other question is what to use for a logical primary key. They are both essentially just non-null unique index constraints.
le dorfier
"Every problem is solved with another level of indirection"... Surrogate keys are just that : *another* indirection level
Steve Schnepp
I find it odd that many comments seem to assert that one cannot setup a relationship without a surrogate key. In many cases, the surrogate key is superfluous. Why add something that brings no value but adds technical debt (and in some cases, causes an otherwise unique result to suddenly become non-unique).
wilmoore
+6  A: 

It appears that no one has yet said anything in support of non-surrogate (I hesitate to say "natural") keys. So here goes...

A disadvantage of surrogate keys is that they are meaningless (cited as an advantage by some, but...). This sometimes forces you to join a lot more tables into your query than should really be necessary. Compare:

select sum(t.hours)
from timesheets t
where t.dept_code = 'HR'
and t.status = 'VALID'
and t.project_code = 'MYPROJECT'
and t.task = 'BUILD';

against:

select sum(t.hours)
from timesheets t
     join departents d on d.dept_id = t.dept_id
     join timesheet_statues s on s.status_id = t.status_id
     join projects p on p.project_id = t.project_id
     join tasks k on k.task_id = t.task_id
where d.dept_code = 'HR'
and s.status = 'VALID'
and p.project_code = 'MYPROJECT'
and k.task_code = 'BUILD';

Unless anyone seriously thinks the following is a good idea?:

select sum(t.hours)
from timesheets t
where t.dept_id = 34394
and t.status_id = 89    
and t.project_id = 1253
and t.task_id = 77;

"But" someone will say, "what happens when the code for MYPROJECT or VALID or HR changes?" To which my answer would be: "why would you need to change it?" These aren't "natural" keys in the sense that some outside body is going to legislate that henceforth 'VALID' should be re-coded as 'GOOD'. Only a small percentage of "natural" keys really fall into that category - SSN and Zip code being the usual examples. I would definitely use a meaningless numeric key for tables like Person, Address - but not for everything, which for some reason most people here seem to advocate.

See also: my answer to another question

Tony Andrews
A: 

Surrogate keys can be useful when business information can change or be identical. Business names don't have to be unique across the country, after all. Suppose you deal with two businesses named Smith Electronics, one in Kansas and one in Michigan. You can distinguish them by address, but that'll change. Even the state can change; what if Smith Electronics of Kansas City, Kansas moves across the river to Kansas City, Missouri? There's no obvious way of keeping these businesses distinct with natural key information, so a surrogate key is very useful.

Think of the surrogate key like an ISBN number. Usually, you identify a book by title and author. However, I've got two books titled "Pearl Harbor" by H. P. Willmott, and they're definitely different books, not just different editions. In a case like that, I could refer to the looks of the books, or the earlier versus the later, but it's just as well I have the ISBN to fall back on.

David Thornley
I think I have to disagree with your example here. An ISBN number is an attribute of a book. A surrogate key is independent of the rest of the row data, therefore this position would advocate using a separate surrogate key for a book table, even though the ISBN already uniquely identifies every book.
Christopher Cashell
Alternately, think of the ISBN as a surrogate key itself. It's an identifier with no meaning, just a code that is applied to a specific book. If you're making a books table, the ISBN may as well be the primary key (assuming you have and always will have one book per row).
David Thornley
A: 

In the case of point in time database it is best to have combination of surrogate and natural keys. e.g. you need to track a member information for a club. Some attributes of a member never change. e.g Date of Birth but name can change. So create a Member table with a member_id surrogate key and have a column for DOB. Create another table called person name and have columns for member_id, member_fname, member_lname, date_updated. In this table the natural key would be member_id + date_updated.

+2  A: 

I hate surrogate keys in general. They should only be used when there is no quality natural key available. It is rather absurd when you think about it, to think that adding meaningless data to your table could make things better.

Here are my reasons:

1.) When using natural keys, tables are clustered in the way that they are most often searched thus making queries faster.

2.) When using surrogate keys you must add unique indexes on logical key columns. You still need to prevent logical duplicate data. For example, you can’t allow two Organizations with the same name in your Organization table even though the pk is a surrogate id column.

3.) When surrogate keys are used as the primary key it is much less clear what the natural primary keys are. When developing you want to know what set of columns make the table unique.

4.) In one to many relationship chains, the logical key chains. So for example, Organizations have many Accounts and Accounts have many Invoices. So the logical-key of Organization is OrgName. The logical-key of Accounts is OrgName, AccountID. The logical-key of Invoice is OrgName, AccountID, InvoiceNumber.

When surrogate keys are used, the key chains are truncated by only having a foreign key to the immediate parent. For example, the Invoice table does not have an OrgName column. It only has a column for the AccountID. If you want to search for invoices for a given organization, then you will need to join the Organization, Account, and Invoice tables. If you use logical keys, then you could Query the Organization table directly.

5.) Storing surrogate key values of lookup tables causes tables to be filled with meaningless integers. To view the data, complex views must be created that join to all of the lookup tables. A lookup table is meant to hold a set of acceptable values for a column. It should not be codified by storing an integer surrogate key instead. There is nothing in the normalization rules that suggest that you should store a surrogate integer instead of the value itself.

6.) I have three different database books. Not one of them shows using surrogate keys.

Ken
Well said, Sir.
Christopher Cashell
I hate surrogate keys, except when they are necessary. They are necessary when the enterprise uses a natural key that is subject to a lot of errors, and are unwilling to tolerate a database that's impacted by those errors.
Walter Mitty
+3  A: 

One link

Nicolas Dorier