views:

163

answers:

8

Which one is the best practice and Why?

a) Type Table, Surrogate/Artificial Key

Foreign key is from user.type to type.id: alt text

b) Type Table, Natural Key

Foreign key is from user.type to type.typeName: alt text

+1  A: 

If typeName is a natural key, then it's probably the preferable option, because it won't require a join to get the value.

You should only really use a surrogate key (id) when the name is likely to change.

Bennor McCarthy
+3  A: 

You should probably always use an ID number (that way if you change the type name, you don't need to update the user table) it also allows you to keep your datasize down, as a table full of INTs is much smaller than one full of 45 character varchars.

webdestroya
+5  A: 

The first one is more future proof, because it allows you to change the string representing the type without updating the whole user table. In other words you use a surrogate key, an additional immutable identifier introduced for the sake of flexibility.

Adam Byrtek
+1 for gifting me with the "future proof" phrase.
Tony Ennis
+4  A: 

A good reason to use a surrogate key (instead of a natural key like name) is when the natural key isn't really a good choice in terms of uniqueness. In my lifetime i've known no fewer than 4 "Chris Smith"s. Person names are not unique.

TokenMacGuy
+8  A: 

I believe that in practice, using a natural key is rarely the best option. I would probably go for the surrogate key approach as in your first example.

The following are the main disadvantages of the natural key approach:

  • You might have an incorrect type name, or you may simply want to rename the type. To edit it, you would have to update all the tables that would be using it as a foreign key.

  • An index on an int field will be much more compact than one on a varchar field.

  • In some cases, it might be difficult to have a unique natural key, and this is necessary since it will be used as a primary key. This might not apply in your case.

Daniel Vassallo
thanks for the detailed answer, other answers were great as well
aryaxt
Generally, the best solution is both - have the surrogate key for referential integrity, and the natural key (generally, it's compound) to enforce the tuple's uniqueness. So in the example above, a unique constraint/index would exist on type.typename and on some combination of columns in user.
Adam Musch
+4  A: 

I prefer to use the surrogate key. It is often people will identity and use the natural key which will be fine for a while, until they decide they want to change the value. Then problems start.

Craig
in databases that properly support foreign key semantics, changing this is only tricky from the point of getting the schema right in the first place (update cascades in the right places). If your database doesn't support this well, then you are right in avoiding natural keys.
TokenMacGuy
+1  A: 

Surrogate key for me too, please.

The other might be easier when you need to bang out some code, but it will eventually be harder. Back in the day, my tech boss decided using an email addr as a primary key was a good idea. Needless to say, when people wanted to change their addresses it really sucked.

Tony Ennis
A: 

Use natural keys whenever they work. Names usually don't work. They are too mutable.

If you are inventing your own data, you might as well invent a syntheic key. If you are building a database of data provided by other people or their software, analyze the source data to see how they identify things that need identification.

If they are managing data at all well, they will have natural keys that work for the important stuff. For the unimportant stuff, suit yourself.

Walter Mitty