views:

305

answers:

7

Is there a performance gain or best practice when it comes to using unique, numeric ID fields in a database table compared to using character-based ones?

For instance, if I had two tables:

athlete

id ... 17, name ... Rickey Henderson, teamid ... 28

team

teamid ... 28, teamname ... Oakland

The athlete table, with thousands of players, would be easier to read if the teamid was, say, "OAK" or "SD" instead of "28" or "31". Let's take for granted the teamid values would remain unique and consistent in character form.

I know you CAN use characters, but is it a bad idea for indexing, filtering, etc for any reason?

Please ignore the normalization argument as these tables are more complicated than the example.

+10  A: 

I find primary keys that are meaningless numbers cause less headaches in the long run.

brian
I agree. The only exception to that is when you have to have a primary key that must be unique across multiple tables. In which case, use a Sequence (if oracle or postgresql) or a GUID (if MS Sql Server).
Glenn
+1 for meaningless primary keys!
ninesided
+3  A: 

Text is fine, for all the reasons you mentioned.

If the string is only a few characters, then it will be nearly as small an an integer anyway. The biggest potential drawback to using strings is the size: database performance is related to how many disk accesses are needed. Making the index twice as big, for example, could create disk-cache pressure, and increase the number of disk seeks.

Doug Currie
+3  A: 

I'd stay away from using text as your key - what happens in the future when you want to change the team ID for some team? You'd have to cascade that key change all through your data, when it's the exact thing a primary key can avoid. Also, though I don't have any emperical evidence, I'd think the INT key would be significantly faster than the text one.

Perhaps you can create views for your data that make it easier to consume, while still using a numeric primary key.

rwmnau
+2  A: 

I recommend using ints or bigints for primary keys. Benefits include:

  • This allows for faster joins.
  • Having no semantic meaning in your primary key allows you to change the fields with semantic meaning without affecting relationships to other tables.

You can always have another column to hold team_code or something for "OAK" and "SD". Also

Brian Fisher
+2  A: 

The standard answer is to use numbers because they are faster to index; no need to compute a hash or whatever.

If you use a meaningful value as a primary key you'll have to update it all through you're database if the team name changes.

To satisfy the above, but still make the database directly readable,

  • use a number field as the primary key

  • immediately create a view Athlete_And_Team that joins the Athlete and Team tables

Then you can use the view when you're going through the data by hand.

A: 

Are you talking about your primary key or your clustered index? Your clustered index should be the column which you will use to uniquely identify that row by most often. It also defines the logical ordering of the rows in your table. The clustered index will almost always be your primary key, but there are circumstances where they can be differant.

Charles Graham
+2  A: 

I'm just going to roll with your example. Doug is correct when he says that text is fine. Even for a medium sized (~50gig) database having a 3 letter code be a primary key won't kill the database. If it makes development easier, reduces joins on the other table and it's a field that users would be typing in...I say go for it. Don't do it if it's just an abbreviation that you show on a page or because it makes the athletes table look pretty. I think the key is the question "Is this a code that the user will type in and not just pick from a list?"

Let me give you an example of when I used a text column for a key. I was making software for processing medical claims. After the claim got all digitized a human had to look at the claim and then pick a code for it that designated what kind of claim it was. There were hundreds of codes...and these guys had them all memorized or crib sheets to help them. They'd been using these same codes for years. Using a 3 letter key let them just fly through the claims processing.

Al W