views:

273

answers:

8

I don't believe in a Silver Bullet, but I really like to use sequences or autonumber identity columns as my primary key columns for database tables. They're unique, they index well, and I don't have to worry about null values.

On the other hand, in some cases, they seem redundant when there are other unique columns in the table that could serve the same purpose. Say, for example, you're building a table that maps 9-digit ZIP codes to city zones. The ZIP code field could work just as well (provided you can guarantee the data format and no duplication of values).

To the point: My experience, as it is with any of us, is limited. What other real-world examples have lead folks to choose not to use an autonumber column as the primary key for a table, and why?

This is a "broaden your horizons" type of thing for me, and I hope to learn a bit from folks who have worked with a plethora of databases and had compelling reasons to choose otherwise.

+4  A: 

A link table springs to mind as the most obvious choice for a composite key

John Nolan
Yeah, these are prime examples of tables that perplex me. I have caught myself modeling these with autonumber columns on them, and then asking myself, "Why?" Both columns are unique; they're foreign keys! Adding another column is superfluous!
Mike Hofer
+1  A: 

Really the only time I can think of to use an identity column is when the number of fields required to make a primary key is large, or if the field which is the primary key is really big (like a 20 character string). In all other instances, I prefer not to use them.

The problem no one every brings up about identities, is what happens when something happens to the data. Since the key is based solely on when the record was added, reloading data into a table after a catastrophic event is a real problem. Now the dbms should help you out and prevent someone from truncating a table, or switching the values of the primary keys...should. Things happen, tables get corrupted, or database updates run into problems. With identity primary keys, all of a sudden you're left with a mess trying to figure out which identity values goes with which row....wait except you can't, because the identity value has no meaning in regards to the data. With handful of entries, you might be ok, but when you start having larger tables of maybe a couple million values (ours was a little over 11 million when this happened) that gets to be really problematic in a hurry. Everyone says, "that's a worse case scenario, it'll never happen." That is until it does.

Kevin
what would your PK be on a Users table? their user id? I hope no one gets married and wants to change their user id, it would be a real pain to alter all the FKs to the new user id!
KM
To add to this, I would say that if you do use an identity, you should add other constraints on your "real" data columns to ensure data integrity, e.g. UNIQUE constraints etc. In this case I believe you would call the identity column a "surrogate key". Of course it does all depend on what data you are actually modeling.
Nathan
when the key can change based on some external issue (business logic, user whim, etc) you need and surrogate key, unless you have no FKs or like to update all of them with the new value.
KM
1. Cascade update2. If you have your PKs changing, you really didn't select a good primary key.3. Yes, it'd be the user id. Simple solution, don't allow people to change it. Really how many times have you seen a user id change? Marriage has nothing to do with it. It can be a security risk and you don't allow it. If you are really concerned about changing a user id, the account should be disabled and a new one created.
Kevin
@Kevin, in the system we work on, people have changed used ids. you tell the divorced lady you have to keep your old id, I'm not going to. They don't want all of their previous/current work unassociated with them, which happens with a new id.
KM
I have no problem doing that. My job is to protect highly sensitive information. It may sound cold, but someone having hard feelings because they want their username changed is not at the top of my priorities. Changing usernames is just another avenue a potential method an attacker can mask breaking into a system.
Kevin
I see my users in the hallway.
KM
People can go to jail if I make a mistake.
Kevin
@Kevin said "People can go to jail if I make a mistake.", if that's the case, I'd never change a key value, and go with an identity surrogate.
KM
You're missing the point. If your key value is subject to change, then it isn't a primary key value. You're talking about changing a username, which I wouldn't do. That leads to potential security holes and can be used as a way of covering tracks when breaking into a system or gaining information about accounts.
Kevin
the display value of a user name or id isn't a key, that is why I use an identity
KM
Your usernames aren't unique?
Kevin
+4  A: 

IMHO it is crucial to use a identity column as even the simplest table can become more important in the future.

The only time i wouldn't use one would be where i used a GUID instead, for circumstances where perhaps records were created on disconnected clients that then needed to synchronize with a central system.

Iain Hoult
It's that whole "You never know" thing that gets me. I don't want to over-design the database (YAGNI), but, at the same time, I don't want to have to go back in and deal with the hassles of adding an autonumber column after-the-fact, either. It would be unsettling to discover that our assumptions about a "rock-solid" primary key selection turned out to be false, and we had to backtrack on them.
Mike Hofer
+3  A: 

I'm a firm believer in the use of technical primary keys almost without exception so my answer would have to be... never.

cletus
Could you clarify what you mean by "technical primary key"?
Mike Hofer
A: 

One area where I have not used an autonumber field is when defining a DateDimension table as part of a star schema. In this instance I used an integer representing the date in yyyymmdd format. This allowed for fast joins between the central fact table and the DateDimension (as an auto-number ID column also would). However ...

The DateDimension table contained other date representations (e.g. smalldatetime column, dayOfWeek column, etc). If users only wanted the date in yyyymmdd format the join wasn't necessary as the date dimension key in the central fact table already stored this information.

In general I'm not a big fan of keys containing business information. Typically, the assumption you make about the primary key when designing the schema won't hold true over time and you'll become unstuck. In this case I was fairly sure that the date would not!

Adamski
+4  A: 

my rule of thumb is: "if you're going to add records in normal usage, use an autoincrement PK; if it's a static table, use whatever identifier is more 'natural'"

IOW: users, history records, assets; all get an autoincrement PK. zip/city, type/descriptions, machine IDs, usually get a 'natural' key.

Javier
+2  A: 

I will typically avoid auto_increment columns in situations where frequent data dumps / loads / merges are required and I have foreign key relationships. Attempting to merge data from two table instances of the same schema that use auto incrementing ids is a terrifying problem.

This sort of usage doesn't crop up for most, but my line of work involves a lot of batch processing where each batch then gets merged in to a master database for later analysis/use.

TK
i'd guess in those cases using an UUID as PK would do the trick.
Javier
A: 

One exception to the principle expressed by Iain Hoult, Javier, and TK would be the use of an employee number or "badge number" as the PK of a personnel table. In this case, the PK can be called a "meaningful key" only because we have handed the employee the PK of his personnel record.

-Al.

A. I. Breveleri