Primary key: code or name?

tags:

database-design

views:

answers:

+3 Q:

Primary key: code or name?

One of the most common kinds of database table has an alphanumeric code and human friendly name (e.g. countries, currencies, accounts, products, VAT codes etc.)

The traditional/obvious thing to do is make the code the primary key. And for some tables e.g. customers where the number may be large so that names may not be unique identifiers, this is clearly the correct thing to do.

But what about things like countries and currencies where the number is guaranteed to be small and the names are guaranteed to be unique? In that case, references will almost always be both input and displayed with the human friendly name.

In that scenario, is there any reason not to make the name the primary key?

+1 A:

if you're going to be using the value of these fields in other tables, then you would be violating normalization principles of database design, specifically, you would have same data in multiple places. This is bad for several reasons:

numbers are faster to search than strings
strings take up more memory than numbers

Anatoly G 2010-10-19 07:12:29

The codes are still alphanumeric in all cases, but it doesn't matter, machine resources aren't in short supply. (If anything, name as primary key would improve the performance for large enough data volumes, by avoiding a join.)

rwallace 2010-10-19 07:14:54

de-normalization is for sure a tactic that helps when you're too normalized. I prefer IDs, as some strings are quite long and at large-enough volume, it's better to conserve space than not.

Anatoly G 2010-10-19 07:18:38

+3 A:

A few thoughts... If you're interfacing with other applications then the ISO standards for countries and currencies would be better received.

Although such data is more or less static ... country names do change, e.g. Ceylon to Sri Lanka, Rhodesia to Zimbabwe, so you would need to update in many places rather than just the description on your lookup table.

Input/display are not always undertaken using the friendly name, e.g. data entry of currency where the users are comfortable with this.

If picking a country from a list then its fairly trivial to have this translated to a code under the bonnet.

richaux 2010-10-19 07:29:45

+1 A:

Never, ever, use a business value as a key.

Here's why: No matter how sure you are that the customer number is immutable, sometime in the future a customer number will have to changed (Chinese customer gets 4444, which is very unlucky, whatever). If you've used 4444 as a key, you'll have to change not only the customer's key, but also the related records in his orders, his addresses, etc.

(Some will argue that this can be resolved with cascading updates, but it's risky in the presence of triggers.)

Best practice: Create a surrogate key and call it ID (some prefer CustomerID). A surrogate is a key which is hidden from the users (thus its name) and whose only purpose is to provide a unique key. This allows you to make unambiguous joins and deletes without worrying about what users might change.

Every table should have exactly one surrogate primary key and it is either an auto-incremented integer or a GUID (varies according to the database provider).

There is only one allowable exception to this rule: when creating a N-N relation (e.g. one customer can have many addresses and addresses can be shared by customers). In this case it is acceptable to use the [CustomerID, AddressID] pair as the primary key.

Oh, and finally, joins on integers/guids, which are fixed-length are much faster than joins on varying length strings.

smirkingman 2010-10-19 07:37:27

I was indeed planning to use cascading updates for exactly this purpose. Risky in the presence of triggers, how do you mean? For example, I'm using triggers to update history tables, and I thought the logic was sound (being careful to store old as well as new primary key values), what problems should I be looking out for?

rwallace 2010-10-19 07:44:25

The answer is in your question:"being careful to store old as well as new primary key values". If you use surrogate keys, all these problems disappear.

smirkingman 2010-10-19 07:57:01

That's a very good point actually. If I use GUIDs across the board, all those problems disappear, and I've already got to write the code for data entry with something that isn't the primary key anyway. Maybe that's the right way to go. Okay, thanks!

rwallace 2010-10-19 08:17:42

You're welcome, thanks for accepting the answer.

smirkingman 2010-10-19 10:22:41

The primary key is not only a means to retrieve records from the table it is defined in. It is also used to reference records from this table elsewhere (foreign key). Using names rather than codes (that do not change!) as primary key is subject to:

A lot of maintenance in case of name change
Poorer performance when performing names comparisons in joins and lookup queries
More disk space required in tables and indexes (not significant for small tables but can become problematic for big ones)

Note that even if numbers comparison is faster than strings, I'd choose to stick to small non changing codes (<10 letters) rather than having ID (Number as PK) + Code (string as UK) + Name (string) for practical reasons.

vc 74 2010-10-19 07:52:24

ansaurus

tags:

views:

answers:

Primary key: code or name?

related questions