views:

185

answers:

6

I'm trying to figure out the "best practices" for deciding whether or not to add an auto-incrementing integer as the primary key to a table.

Let's say I have a table containing data about the chemical elements. The atomic number of each element is unique and will never change. So rather than using an auto-incrementing integer for each column, it would probably make more sense to just use the atomic number, correct?

Would the same be true if I had a table of books? Should I use the ISBN or an auto-incrementing integer for the primary key? Or a table of employees containing each person's SSN?

+3  A: 

There are a lot of already addressed questions on Stack Overflow that can help you with your questions. See here, here and here.

The term you should be looking for: surrogated keys.

Hope it helps.

Pablo Santa Cruz
Ah. Thanks. I search around a bit but couldn't figure out how to properly phrase the keywords.
jamieb
Awesome. Glad to help.
Pablo Santa Cruz
+2  A: 

You've got the idea right there.

Auto-increment should be used as a unique key when no unique key already exists about the items you are modelling. So for Elements you could use the Atomic Number or Books the ISBN number.

But if people are posting messages on a message board then these need a unique ID, but don't contain one naturally so we assign the next number from a list.

It make sense to use natural keys where possible, just remember to make the field as the primary key and ensure that it is indexed for performance

Craig Warren
A: 

I'm trying to figure out the "best practices" for deciding whether or not to add an auto-incrementing integer as the primary key to a table.

Use it as a unique identifier with a dataset where the PKey is not part of user managed data.

Let's say I have a table containing data about the chemical elements. The atomic number of each element is unique and will never change. So rather than using an auto-incrementing integer for each column, it would probably make more sense to just use the atomic number, correct?

Yes.

Would the same be true if I had a table of books? Should I use the ISBN or an auto-incrementing integer for the primary key? Or a table of employees containing each person's SSN?

ISBNs/SS#s are assigned by third-parties and because of their large storage size would be a highly inefficient way to uniquely identify a row. Remember, PKeys are useful when you join tables. Why use a large data format like an ISBN which would be numerous textual characters as the Unique identifier when a small and compact format like Integer is available?

Nissan Fan
+1  A: 

The main problem that I have seen with the auto incrementing an integer approach is when you export your data to bring into another db instance, or even an archive and restore operation. Because the integer has no relation to the data that it references, there is no way to determine if you have duplicates when restoring or adding data to an existing database. If you want no relationship between the data contained in the row and the PK, I would just use a guid. Not very user friendly to look at, but it solves the above problem.

Rob Goodwin
+2  A: 

With regards to using ISBN and SSN you really have to Think about how many rows in other tables are going to reference these through foreign keys because those ids will take up much more space than an integer and thus may lead to a waste of disk space and possibly to worse join performance.

klausbyskov
+2  A: 

This is a highly debated question, with lots of emotion on both sides.

In my humble opinion, if there's a good, useable natural key available -- like an ISBN -- I use it. I'm going to store it on the database anyway. Yes, a natural key is usually bigger than an integer auto-increment key, but I think this issue is overblown. Disk space is cheap today. I'd worry more about it taking longer to process. If you were talking about an 80 byte text field as a primary key, I'd say no. But if you're thinking of using a 10-byte ISBN instead of an 8-byte big integer, I can't imagine that brings much of a performance penalty.

Sometimes there's a performance advantage to natural keys. Suppose, for example, I want to find how many copies of a given book have been sold. I don't care about any of the data from the Book master record. If the primary key is ISBN, I could simply write "select count(*) from sale where isbn='143573338X'". If I used an autoincrement key, I would have to do a join to look up the isbn, and the query becomes more complex and slower, like "select count(*) from book join sale using (bookid) where isbn='143573338X'". (And I can assure you that that as this particular ISBN is for my book, the number of sale records is very small, so doing the join and reading one extra record is a big percentage difference!)

Another advantage of natural keys is that when you have to work on the database and you look at records that refer back to this table by key, it's easy to see what record they're referring to.

On the other hand, if there is no good, obvious natural key, don't try to cobble together a crazy one. I've seen people try to make a natural key by concatenating together the first 6 letters of the customers first name, his year of birth, and his zip code, and then pray that that will be unique. That sort of silliness is just making trouble for yourself. Often people end up taking on a sequence number to insure it's unique anyway, and at that point, why bother? Why not just use the sequence number by itself as the key?

Jay