views:

290

answers:

6

I have several database tables that just contain a single column and very few rows, often just an ID of something defined in another system. These tables are then referenced with foreign keys from other tables. For example one table contains country codes (SE, DK, US etc). All values are always unique natural keys and they are used as primary keys in other (legacy) systems.

It seems really unnecessary to introduce a new surrogate key to these tables, or?

In general, what are the exceptional cases when surrogate keys shouldn't be used?

+1  A: 

Natural keys (country codes in your case) are better because

  • they make sense when you see them (Surrogate key alone means nothing to the user. This is important for the DB developers and maintainers who often need to work with raw DB outputs)
  • less joins (often you need only the country code, and they're already in other tables. If you use surrogate keys, then you'll need to join the lookup table)

The downside of the natural keys is that they're tied to the information logic, and if it changes (which sometimes happens), you need to alter a lot of tables, basically overhauling a significant part of the DB.

So, if in your DB the logic doesn't change for many years, use natural keys.

culebrón
Surely surrogate keys should never be presented to the user, they are being used to associate data. I would normally present primary keys to the user (unless they are natural keys in which case they are presented as 'keys' but rather as the data itself).
Lazarus
I've added to my response that this is an advantage for development. Often you have a text dump of a table, or work in another non-IDEish environment and can't lookup the reference table. Surrogate keys in this case slow the work down significantly.
culebrón
That's true, unless the surrogate is part of the record (and appropriately indexed) then the overhead for the link table or whatever other method is used can become significant as demand for the data grows.
Lazarus
Looking at my first comment, I've managed to completed mess up the English. It should be "I would not normally present primary keys" and "presented not as 'keys'". Sometimes my ability to bugger up a sentence amazes me.
Lazarus
+3  A: 

I'm not sure there's an exception case when surrogate keys shouldn't be used. I think that the nature of a surrogate key, generally to make a reference globally unique, is particularly relevant when applied to a system such as you describe.

While each of the satellite primary keys you mention may be unique within their own scope, you can't really guarantee that they will remain unique across the whole scope of your interconnected environment, especially if it expands. I suspect the original designers were either trying to future proof their system or riding the latest fad they'd learnt ;)

Lazarus
A natural key is also globally unique, otherwise it wouldn't be a natural key.If you mean that surrogate keys are unique across the whole database then are you saying that you use a single counter shared across all tables?!
andrew cooke
If it were true that natural keys were *globally* unique then there'd be no need for surrogate keys at all. Natural keys tend to be unique within the scope for which they were selected but often not beyond. When working within an environment like an enterprise where there may be many similar systems created over time (and acquired through company purchase) by varying elements in the company, you often find that natural keys are either not unique or, just as bad, there are multiple keys for a single target.
Lazarus
I should add that I'm thinking very much in a data warehouse manner. For exchanging data between systems, I'd tend to use a gateway in the middle and take the performance hit to allow for extensibility in the future.
Lazarus
+8  A: 

I would say the following criteria must be met:

  • your natural key must be absolutely, positively, no-exceptions-allowed, unique (things like names, social security numbers etc. usually seem to be unique - but really aren't)

  • your natural key should be as small as an INT, e.g. not significantly more than 4 bytes in size (don't use a VARCHAR(50) for your PK, and especially not for your clustering key in SQL Server !)

  • your natural key ought to be stable, e.g. never change (OK, with ISO country codes, this is almost a given - except when countries like Yugoslavia or the USSR collapse, or other like the two Germanies unite - but that's rare enough)

If those conditions are met, you can consider a natural key as your PK - but that should be the 2% exception in all your tables - not the norm.

Marc

marc_s
+2  A: 

There's a long standing debate on this. If you google for "surrogate v natural keys" you will get many links. So I suspect you will get a debate rather than a clear answer here.

From this article:

Data modelers (for this discussion, I include anyone who has designed tables for a database) are divided on this question: Some modelers swear by the surrogate key; others would die before they used anything but a natural key. A search of the literature about data modeling and database design supports neither side except in the data warehouse arena, in which a surrogate key is the only choice for both dimension and fact tables.

andrew cooke
A: 

In addition to what marc_s said, you do not need a surrgogate key generally in a linking table that is a table that contains two different primary keys only that is used to create many-to-many relationships. In general, a composite key on both fields works fine here. This is one of the few times I suggest a composite key, in general I prefer a surrogate key and a unique index on the composite key.

HLGEM
A: 

Using natural keys for identification purposes is a good idea whenever natural keys can really be trusted. See Marc_S response for some cases where natural keys cannot be trusted. Don't worry too much about efficiency. Even something long like the VIN (Vehicle Identification Number) won't drag your database down very much. If you think it will, make a few tests, realizing that efficiency does not scale linearly.

The primary reason for declaring a primary key is to prevent a table from slipping out of first normal form, and thereby no longer representing a relation. The use of an autoincremented surrogate key can result in two rows with different id fields, but otherwise identical. This will bring you some of the problems that go with data that's not in first normal form. And the users won't be able to help, because they can't see the id field.

If a table's rows can be determined by some combination of two or more foreign keys, what you have is a relationship table, sometimes called a linkage table or a junction table. You are usually better off to declare a composite primary key consisting of all the needed foreign keys.

If the above choices result in slow preformance, sometimes that can be remedied by creating some extra indexes. It depends on what you are doing with the data.

Walter Mitty