views:

258

answers:

10

Is there a reason to use a single incrementing field for a primary key instead of multiple fields that actually represent the unique record?

I'm working on an existing php application, and the tables all seem to have a single 'id' key instead of using the 2 or more fields that are actually unique to the record (like user, auction, bid).

I'm not a database expert, but that just seems lazy (or inexperienced) to me. Is there any benefit (performance or otherwise)?

Updated: I'm not referring to psudo-unique data (ssn, e-mail address, etc), where you may want to ensure the data is really unique. I'm talking about tables with obvious foreign key references, but instead of using those references along with the unique field(s) in the table itself, every table just has an incrementing ID.

Not trying to start up a subjective debate, it just didn't make sense to me.

+5  A: 

It depends on the definition of "Unique". Yes, names, email addresses, and SSN values are supposed to be "unique". However, stranger things have happened. Having a separate ID value, in a lot of cases, can make life a lot easier...

Update

Based on the edit to the question, I don't really see much of a need. It sounds like the situation you have is something like. a "join table", something where you are simply creating an association of a UniqueId from one table to the UniqueId of another table.

A simple example of what I'm thinking you are talking about would be a User -> Role association. You must associate a User to a Role. A UserId and a RoleId.

You have in your database a structure similar to

MappingId (Your Auto Number) (This is the PK)
UserId (From the user table)
RoleId (From the Role table)

This structure does NOT make sense to me, I would have just the User and RoleId make up the Primary Key, since there is no need to duplicate entries here.

If you have something different that might change things...

Mitchel Sellers
Agreed. I typically opt for a sythetic primary key but then add an index with a uniqueness constraint to identify the natural key. That way I can change / delete the index if my initial assumptions during design no longer hold.
Adamski
SSNs will be reused over time. After all, the number of people alive in the US is estimated to be 306,808,431 according to the US Census population clock... nearly 1/3 of the available space for SSNs.
R. Bemrose
Updated the question to make it clearer, not talking about questionably unique data. I mean a table with a userid, auctionid, etc, but sill have a single auto incrementing primary key.
Tim Lytle
Posted updated comments
Mitchel Sellers
I wanted it to be more abstract, guess that might have been less than helpful. Example: A 'bid' table has the auctionid, the userid, and the bid amount (along with other data about the bid). Why then do they use a bidid as the primary key? It's not just joining tables, but it seems to me that you would want to use the other keys to *avoid* duplicate data. So a user can't bid the same amount more than once (sure the code should check that, but shouldn't the database ensure that as well?
Tim Lytle
A: 

Well, the id's give a sequential ordering to your database from 1-infinity. User names and such are transient and aren't always ordered. So, presumably, it would make searching faster. Plus, it seems like you're suggesting having multiple keys signify an item. That is generally going to slow things down because now two things have to be checked to make sure something is the right item instead of one.

Peter
+6  A: 

Using synthetic primary keys has several advantages:

  • You can change values in key fields without having to take an index update hit
  • The indexes are smaller
  • It makes foreign key relationships simpler
  • Since you're not dealing with strings, there are never encoding issues

Databases often have specific optimizations around building indexes with monotonically incrementing keys.

That being said, there is nothing wrong with a little denormalization now and then. If the use-case is clear and the tables are relatively small, do what's convenient.

lavinio
+1  A: 

You generally want a clustered index on your primary key. The issue with having a compound, clustered, primary key is that as you insert new rows SQL has to stick the new record in between other records which means shuffling. In addition, the larger your primary key the more space is required to store it.

Here is an article on using a GUID as a primary key, but the same holds true for a compound key.

Also see this great answer.

Talljoe
+3  A: 

Oh dear, it looks like we're opening up the great natural vs. surrogate keys debate again.

The simplest reason is to prevent data redundancy. Natural keys tend to require multiple keys that may change over the lifetime of the database.

For example, if a person gets married and changes their last name, then that last name has to be updated everywhere it's referenced.

This isn't a problem if you have your foreign keys set to on update cascade, as the DB will do it for you.

As your table nest further and further, you may find your keys need more and more columns. I've actually seen a table that had a seven-column primary key. For a table that only had four other columns.

R. Bemrose
On redundancy, now the code is responsibly to check for duplicate data, instead of the database stopping duplicate data. For example, in the code that prompted this question, before an auction offer can be submitted the code has to check the database for any offers of the same value from the same user for the same auction. Sure it should do that anyway, but if the code fails, the database won't stop it. In that case there's duplicate data.
Tim Lytle
R. Bemrose wrote : This isn't a problem if you have your foreign keys set to on update cascade, as the DB will do it for you.But all references to this record outsite your database are lost. If you have interfaces with other systems your in trouble.
Robert Merkwürdigeliebe
@Tim Lytle: There's nothing stopping you from having unique constraints.
R. Bemrose
A: 

Here are a few points to use autonumber

  • Autonumbers are a single unique key that make foreign key relationships much easier to maintain and use

    Autonumbers are numbers so it is pretty easy to use them and not mess them up. What I mean is, if your primary key is a string and your developer forgets to put that in single quotes it will destroy your performance

    It is normal standard practice to use an autonumber

    You can still make other fields "unique"

    resetting a sequence is much easier with an autonumber

    if you need to jump ahead in the sequence it is much easier with a number than a combo of attributes, or strings

Just a few things...

northpole
A: 

in most cases, it's really not clear-cut when those fields really uniquely identify the entity represented by the record. again and again I've seen cases where old database concepts entrenched in the business mentality hamper any further evolution.

Javier
A: 

Yes, this will spur debate.

In general, primary key data should be immutable, which is frequently not the case when using a natural key derived from the table data. As noted earlier, things like SSNs can often be changed, thus throwing off immutability.

Monotonically increasing surrogate keys, like "autonumber" or "identity" columns, are a simple substitute for a natural key. However, they can be prone to index inefficiencies since they may not balance well across B-tree style index algorithms. This can be remedied by using a randomly generated surrogate key, like a uniqueidentified, i.e. GUID, in MS SQL Server, but I've read that this also has performance ramifications.

Generally, I use a surrogate key produced from a sequential feature like autonumber or identity for ease of table joins.

Bob Mc
A: 

Almost every "natural" key combination I've ever tried to use in a database ended up being non-unique over time. Data models need to evolve quickly as abstractions become leaky.

That includes names, phone numbers, SSNs, legal document references, page numbers, email addresses, usernames, project numbers, and a few other things I've tried to use over my career.

Aside from that, the other answers regarding performance for writing new records, comparing foreign keys, etc. are enough reason alone.

You can preserve your current business logic of uniqueness without baking it into the primary key--just set up a unique index on your natural-key columns. You'll pay a price on inserts and updates, as with any index, but if it happens to also be a useful index (helps to cover some queries), all the better.

richardtallent
So if the 'natural' key is replaced by a surrogate key, say a username becomes a userid (unique generated number), why not use the userid (along with other immutable ids) as the keys for related tables?
Tim Lytle
A PK, by definition, is unique. For argumentative purposes, you have just created a PK every time that a Unique Constraint is built. Why not just break off the table and replace it with a view?
WolfmanDragon
A: 

It all comes down to how "normal" your data structure is. A highly normalized database,by definition, can only have single field for the primary key. In this case, there is little to any reason to use a serial or auto generated number as a PK. The data structure should be designed with with unique entries as PK(tracking people is a problem, there are only so many names).

Of course with normalization comes the performance penalty, so the database is de-normalized to make is usable(very common for web apps). With a heavily de-normalized DB, many times it is impossible to get a PK without using every field in the table. Remember that the reason that the structure is de-normalized is to increase the performance. All of the databases that I am familiar with builds an index for every PK. The larger the index, the larger the overhead to maintain the index.

Building gigantic indexes will kill the insert and update time performance, making the de-normalization useless(unless its a read-only DB). It also takes longer to search gigantic indices and uses more memory than smaller ones as well.

In summery, It is often advantageous for performance reasons to to auto-generate the PK for any table that requires multiple fields to get a unique PK.

WolfmanDragon