views:

105

answers:

2

Hi there, what is the benefit of having an additional identity column in each table of a database? What are the drawbacks?

Update: Now i want to expand the case and introduce replication. What for is this surogate key (identity comlumn) aside of the rowguid we get with the replication. Behalf of K. Brian Kelley's objection one should set the clustered index on this rowguid (and forget the identity-column). WHat do you think?

+4  A: 

Often the use of surrogate keys, of which IDENTITY columns are the most common, are done for performance reasons. You should still identify the natural keys (the columns which make the row unique based on the data).

Typically surrogate keys are integer values. That makes them easy to link together using joins with other tables (and restrict accordingly using foreign keys). Also, when talking about SQL Server, all nonclustered indexes depend on the clustered index. So if the clustered index is based on the natural key and is large in size as a result, then all the nonclustered indexes are going to be large as well, because they will refer back to the clustered idex. Therefore, a lot of folks build the primary key around that integer-based surrogate key. I know I'm simplifying it a bit, but that's a key reason for the use of surrogate keys.

The drawback is that the surrogate key is it is effectively meaningless. If someone were to change the value of the key, you could break a relationship if the foreign key constraints are not present or are disabled. In the case of making a change which alters the alternate key, you're actually changing the data itself. So you would expect such a break if the entities are built properly and you would be changing the data in related tables as well.

K. Brian Kelley
Actually, an even more important reason for a surrogate key is that it protects you from changes in the data model. What if the data model changes and with it your natural key? If you use the natural key, you have to change all foreign keys (and all code using it) as well. Surrogate keys prevent this. Plus there are even cases where no reasonable natural key exists (e.g. persons in an address book, there are different people with the same name and the same address).
sleske
I'll agree definitely on the no natural key one. Sometimes it just doesn't exist based on the known data. However, the purist would argue that if the natural key changes then the foreign key should change, too. I've got into those arguments and you're not going to change their opinion.
K. Brian Kelley
+3  A: 

Short version: Surrogate or synthetic key (what you probably mean by "identity column") versus natural key is a very old debate.

Pros of surrogate keys:

  • makes you independent of changes in the natural key (think new requirements / changing domain model), which would otherwise cascade through your data model
  • sometimes, there is no natural key (e.g. persons in an address book)
  • is often faster, because it is shorter (single int column, instead of e.g. several varchars)
  • is more convenient in joins etc. because it's only a single column

Cons:

  • you still need a unique index for every candidate natural key (so one more index needed)
  • it is foreign to the domain, and may require an additional join to get the real data

Generally, the agreement is that surrogate keys are usually a good idea, except in simple cases like join tables.

For all the details, see Wikipedia, which has a good article on the topic.

sleske