views:

456

answers:

6

As a rule is it better to use native primary keys (ie existing columns or combination of columns) or set your primary key to an auto generating row of integers?

EDIT:
It has been pointed out to me that this very similar to this question.

The consensus here is to use surrogate keys, which was my natural inclination, but my boss told me I should also use natural keys where possible. His advice may be best for this particular application, as Name in row uniquely identifies it and we have a need to maintain the ability to view old data, thus any changes to the name/rule is going to mean new unique row.

While the answers here are all helpful, most of them are based on the subjective "here is what you should", and do not cite supporting sources. Am I missing some essential reading or are the best practices database design highly subjective and/or application dependent?

+3  A: 

Whatever it is, make it non meaningful (surrogate key). Meaningful primary keys are deadly.

Otávio Décio
Could you explain this further? What do you mean by "meaningful"?
James McMahon
I think that e-mail can be treated as 'meaningful' primary key
Valentin Vasiliev
Meaningful - typical example: SSN. Keys should carry no instrinsic business meaning.
Otávio Décio
meaningful meaning it tells you something when looking at it, when you look at an identity you can't tell what it means
SQLMenace
A: 

I would say auto generating, theres no real reason not to in my mind. Unless your developing some kind of hash table, but even so, I would stick to a unique primary key automatically created by the database. Its quick, simple and reliable. Don't reinvent the wheel if its already there.

Gary Green
No idea why you got modded down, so I counteracted it even if your answer is repetition.
+1  A: 

It is an old war between purists and pragmatists. Purists don't accept surrogate primary keys and insist on using only natural ones. If you ask me, I'll vote for increment (surrogate keys) in most of situations.

Valentin Vasiliev
+5  A: 

A primary key

  1. must identify a row uniquely.
  2. must not contain data, or it will change when your data changes (which is bad)
  3. should be fast in comparing operations (WHERE clauses / joins)

Ideally, you use an artificial (surrogate) key for your rows, a numeric integer data type (INT) is best, because space-efficient and fast.

A primary key should be made of the minimum number of fields to still fulfill conditions 1.-3. For vast majority of tables this minimum is: 1 field.

For relation tables (or very special edge cases), it may be higher. Referencing a table with a composite primary key is cumbersome, so a composite key is not recommended for a table that must be referenced on it's own.

In relation tables (m:n relations) you make a composite key out of the primary keys of the related tables, hence your composite key automatically fulfills all three conditions from above.

You could make primary keys out of data if you are absolutely sure, that it will be unique and will never change. Since this is hard to guarantee, I'd recommend against it.

Tomalak
How can you make a native primary key if it contains no data?
James McMahon
@nemo: You make an artificial key: a number, or a GUID. Data is the record payload, I do not count the ID as data.
Tomalak
Just a point of correction - surrogate keys are never made out of data - they are immutable.
RedFilter
@OrbMan: Yes, I've reformulated that. I was referring to the process of combining fields to a key, but reading the paragraph again I saw that the wording was misleading.
Tomalak
Oops. I was mixing up "surrogate" and "composite", sorry for that.
Tomalak
@Tomalak: So by definition a key that contains no data is not native, right?
James McMahon
Ignore the ? up above. The original wording of your answer confused me, but I get what you mean.
James McMahon
@Nemo: A "native key" as you define it is "existing columns or combination of columns". For me that reads as "data or a combination of data", which is not a recommendable thing to do. A key should be as abstract as possible to prevent that you ever rely on/use it's actual value.
Tomalak
For anything other than row reference and row retrieval, that is.
Tomalak
+2  A: 

Always ints.

You'll appreciate you did so when it's time to cross reference those elements in other tables (using foreign keys)

Assaf Lavie
+5  A: 

This is a pretty common topic.

cletus
Thanks, I did several searches before asking this question, looked tags etc, but uncovered none of these.
James McMahon
Yeah the SO search is... not great. You sorta have to know what to look for or do what I often do and type this in google: site:stackoverflow.com search terms
cletus