views:

1132

answers:

11

I'm designing a small SQL database to be used by a web application.

Let's say a particular table has a Name field for which no two rows will be allowed to have the same value. However, users will be able to change the Name field at any time.

The primary key from this table will be used as a foreign key in other tables. So if the Name field was used as the PK, any changes would need to be propogated to those other tables. On the other hand, the uniqueness requirement would be handled automatically.

My instinct would be to add an integer field to act as the PK, which could be automatically populated by the database. Is there any point in having this field or would it be a waste of time?

+22  A: 

I would use a generated PK myself, just for the reasons you mentioned. Also, indexing and comparing by integer is faster than comparing by strings. You can put a unique index on the name field too without making it a primary key.

Paul Tomblin
This is generally the accepted way of handling primary keys. You should never choose a column for your primary key where its value can be changed - as you then get into a cascading update for all of the tables with that as a fireign key.
Ken Ray
A: 

the PK must be UNIQUE for every row. The auto_increment Integer is very good idea, and if you don't have other ideas about populating the PK then this is the best

Mote
+2  A: 

Yes - and as a rule of thumb, always, for every table.

You should definitely not use a changeable field as a primary key and in the vast majority of circumstances you don't want to use a field that has any other purpose as a primary key.

This is basic good practice for db schemas.

Murph
+1  A: 

Have an integer PK is always a good thing from the performance prospective. All of your relationships will be much more efficient with an integer PK. For example JOINs will be very much faster (MS SQL).

It will also allow you future modifications of the database. Quite often you have a unique name column only to find out later that the name it is not unique at all.

Right now you could enforce the uniqueness of the Name column by having an index on it as well.

Ilya Kochetov
A: 

one exception commonally found is for 'system' data. ie stuff you are defining yourself status fields etc..

ShoeLace
+1  A: 

I would use an auto generated ID field for the Primary Key. It's easier to join with tables based off integer IDs than text. Also, if the Name field is updated often, if it were primary key, the database would be put under stress for updating the index on that field much more often.

If the Name field is always unique, you should still mark it as unique in the database. However, often there will be a possibility (maybe not currently but possibly in the future in your case) of two same names, so I do not recommend it.

Another advantage for using IDs is in the case you have a reporting need on your database. If you have a report you want for a given set of names, the ID filter on the report would stay consistent even when the names might change.

jeffl8n
+9  A: 

What you are describing is called a surrogate key. See the Wikipedia article for the long answer.

finnw
+1  A: 

If you're living in the rarefied circles of theoretical mathematicians (like C. Date does in the-land-where-there-are-no-nulls, because all data values are known and correct), then primary keys can be built from the components of the data that identify the idealized platonic entity to which you are referring (i.e. name+birthday+place of birth+parent's names), but in the messy real world "synthetic keys" that can identify your real-world entities within the context of your database are a much more practical way to do things. (And nullable fields can be very useful to. Take that, relational-design-theory people!)

Jeffrey L Whitledge
Let's hope Celko never becomes a Stacker or we are all in for a major dressing down.
Darrel Miller
Oh, we're all gonna get it bad when that happens!
Jeffrey L Whitledge
A: 

The PK for a record must be unique and permanent. If a record naturally has a simple key which fulfills both of those, then use it. However, they don't come around very often. For a person record, the person's name is neither unique nor permanent, so you pretty much have to use a auto-increment.

The one place where natural keys do work is on code table, e.g. a table mapping a status value to it's description. There little sense given "Active" a PK of 1, "Delay" a PK of 2 etc. when it just as easy to give "Active" a PK of "ACT"; "Delayed", "DLY"; "On Hold", "HLD" and so on.

Note also, some say you should use integers over strings because they compare faster. Not really true. A comparing two 4-byte character fields will take exactly as long as comparing two 4-byte integer fields. Longer string will, of course take longer, but if you keep the codes short, there's no difference.

James Curran
+4  A: 

Though it's faster to search and join on an int column (as many have pointed out), it's even faster to never join in the first place. By storing a natural key, you can often eliminate the need for a join.

For a smallish database, the CASCADE updates to the foreign key references wouldn't have much performance impact, unless they were changing extremely often.

That being said, you should probably use an int or GUID as a surrogate key in this case. An updateable by design PK isn't the best idea, and unless your app has a very compelling business reason to be unique by name - you will inevitably have conflicts.

Mark Brackett
+1  A: 

If your name column will be changing it isn't really a good candidate for a primary key. A primary key should define a unique row of a table. If it can be changed it's not really doing that. Without knowing more specifics about your system I can't say, but this might be a good time for a surrogate key.

I'll also add this in hopes of dispelling the myths of using auto-incrementing integers for all of your primary keys. It is NOT always a performance gain to use them. In fact, quite often it's the exact opposite. If you have an auto-incrementing column that means that every INSERT in the system now has that added overhead of generating a new value.

Also, as Mark points out, with surrogate IDs on all of your tables if you have a chain of tables that are related, to get from one to another you might have to join all of those tables together to traverse them. With natural primary keys that is usually not the case. Joining 6 tables with integers is going to usually be slower than joining 2 tables with a string.

You also often loose the ability to do set-based operations when you have auto-incrementing IDs on all of your tables. Instead of insert 1000 rows into a parent table, then inserting 5000 rows into a child table, you now have to insert the parent rows one at a time in a cursor or some other loop just to get the generated IDs so that you can assign them to the related children. I've seen a 30 second process turned into a 20 minute process because someone insisted on using auto-incrementing IDs on all of the tables in a database.

Finally (at least for reasons I'm listing here - there are certainly others), using auto-incrementing IDs on all of your tables promotes poor design. When the designer no longer has to think about what a natural key might be for a table it usually results in erroneous duplicates ending up in the data. You can try to avoid the problem with unique indexes, but in my experience developers and designers don't go through that extra effort and after a year of using their new system they find that the data is a mess because the database didn't have proper constraints on the data through natural keys.

There's certainly a time for using surrogate keys, but using them blindly on all tables is almost always a mistake.

Tom H.