views:

729

answers:

17

When I am creating a new database table, what factors should I take into account for selecting the primary key's data type?

+7  A: 

If using a numeric key, make sure the datatype is giong to be large enough to hold the number of rows you might expect the table to grow to.

If using a guid, does the extra space needed to store the guid need to be considered? Will coding against guid PKs be a pain for developers or users of the application.

If using composite keys, are you sure that the combined columns will always be unique?

Ely
A: 

I usually always use an integer, but here's an interesting perspective.

http://www.codinghorror.com/blog/archives/000817.html

RG
A: 

I'm partial to using an generated integer key. If you expect the database to grow very large, you can go with bigint.

Some people like to use guids. The pro there is that you can merge multiple instances of the database without altering any keys but the con is that performance can be affected.

TrickyNixon
A: 

For a "natural" key, whatever datatype suits the column(s). Artifical (surrogate) keys are usually integers.

Tony Andrews
A: 

It all depends.

a) Are you fine having unique sequential numeric numbers as your primary key? If yes, then selecting UniqueIdentifier as your primary key will suffice. b) If your business demand is such that you need to have alpha numeric primary key, then you got to go for varchar or nvarchar.

These are the two options I could think of.

Pradeep
+6  A: 

In most cases I use an identity int primary key, unless the scenario requires a lot of replication, in which case I may opt for a GUID.

I (almost) never used meaningful keys.

Galwegian
A: 

Whenever possible, try to use a primary key that is a natural key. For instance, if I had a table where I logged one record every day, the logdate would be a good primary key. Otherwise, if there is no natural key, just use int. If you think you will use more than 2 billion rows, use a bigint. Some people like to use GUIDs, which works well, as they are unique, and you will never run out of space. However, they are needlessly long, and hard to type in if you are just doing adhoc queries.

Kibbee
I don't like this because business rules change. Maybe in the future you will need to log one record every hour because things have gotten busier, or one record per store location instead of one record.
CindyH
Good point Cindy. Also, anything other than a GUID sucks for replication.
Jon Tackabury
I think it should be the other way around: a surrogate key must be the first choice; a natural key may be used only when there is some really, really strong reason to do so. According to my own experience, natural keys could be justified only in toyish, one-off projects...
Yarik
+1  A: 

Do not use a floating point numeric type, since floating point numbers cannot be properly compared for equality.

Jeffrey L Whitledge
Who would ever think of using a floating point number as an index!
Philippe Grondier
I actually know at least three developers who have tried this on occasion. (Not me, of course, because I read _Code Complete_ early on.) Just because something appears self-evidently bad doesn't mean people won't do it.
Jeffrey L Whitledge
Why on Earth would anyone downvote Jeffrey's response???
Yarik
A: 

Hi Aaron

  • Where do you generate it? Incrementing number's don't fit well for keys generated by the client.
  • Do you want a data-dependent or independent key (sometimes you could use an ID from business data, can't say if this is always useful or not)?
  • How well can this type be indexed by your DB?

I have used uniqueidentifiers (GUIDs) or incrementing integers so far.

Cheers Matthias

Mudu
A: 

A great factor is how much data you're going to store. I work for a web analytics company, and we have LOADS of data. So a GUID primary key on our pageviews table would kill us, due to the size.

A rule of thumb: For high performance, you should be able to store your entire index in memory. Guids could easily break this!

MartinHN
A: 

Kim Trip goes in depth on indexing strategies in this podcast.

TGnat
+6  A: 

I don't really like what they teach in school, that is using a 'natural key' (for example ISBN on a bookdatabase) or even having a primary key made up off 2 or more fields. I would never do that. So here's my little advice:

  • Always have one dedicated column in every table for your primary key.
  • They all should have the same colomn name across all tables, i.e. "ID" or "GUID"
  • Use GUIDs when you can (if you don't need performance), otherwise incrementing INTs

EDIT:
Okay, I think I need to explain my choices a little bit.

  • Having a dedicated column namend the same across all table for you primary key, just makes your SQL-Statements a lot of easier to construct and easier for someone else (who might not be familiar with your database layout) easier to understand. Especially when you're doing lots of JOINS and things like that. You won't need to look up what's the primary key for a specific table, you already know, because it's the same everywhere.

  • GUIDs vs. INTs doesn't really matters that much most of the time. Unless you hit the performance cap of GUIDs or doing database merges, you won't have major issues with one or another. BUT there's a reason I prefer GUIDs. The global uniqueness of GUIDs might always come in handy some day. Maybe you don't see a need for it now, but things like, synchronizing parts of the database to a laptop / cell phone or even finding datarecords without needing to know which table they're in, are great examples of the advantages GUIDs can provide. An Integer only identifies a record within the context of one table, whereas a GUID identifies a record everywhere.

Jan Gressmann
But if you go with this technique, you must ensure that there is a UNIQUE constraint on the 2 or more columns that might otherwise be the primary key.
Jonathan Leffler
You've told us what you like but you've not told us **why**
Seun Osewa
If you don't need performance (ie small table), why use something like a GUID? GUIDs should really only be used when you plan to have a lot of data.
Kibbee
Even if you do have a lot of data -- has any seriously overflowed a bigint field? I thought the purpose of a GUID was the need for it to be unique outside the bounds of its table.
Adam Lassek
I wish I could vote this answer up more than once. Your edited explanations are spot on.
Jon Tackabury
agree except for naming all the key fields "ID" - i used to do this, but got really tired of having to rename fields in joins, so now the primary key is always named TableNameId
Steven A. Lowe
the unique "id" denomination is unbearable when you have to build complex SQL instructions. Id_TableName or TableNameId is really and definitely better!
Philippe Grondier
why? just use aliases, i.e, WHERE TABLE_ONE.ID = TABLE_TWO.ID
Jan Gressmann
Jan - I agree with you "use ID in all tables". It's so much more natural for me to write EMPLOYEE_PAYCHECK.EMPLOYEE_ID = EMPLOYEE.ID than anything else.
Jess
+1  A: 

Numbers that have meaning in the real world are usually a bad idea, because every so often the real world changes the rules about how those numbers are used, in particular to allow duplicates, and then you've got a real mess on your hands.

JohnMcG
A: 

Use natural keys when they can be trusted. Some sources of natural keys can't be trusted. Years ago, the Social Security Administration used to occasionally mess up an assign the same SSN to two different people. Theyv'e probably fixed that by now.

You can probably trust VINs for vehicles, and ISBNs for books (but not for pamphlets, which may not have an ISBN).

If you use natural keys, the natural key will determine the datatype.

If you can't trust any natural keys, create a synthetic key. I prefer integers for this purpose. Leave enough room for reasonable expansion.

Walter Mitty
"Use natural keys when they can be trusted": natural keys cannot be allways trusted. So don't use natural keys!
Philippe Grondier
Unless you deal with some toyish, short-term, one-off project - never, ever trust any natural keys. Eventually natural keys fail. Sometimes much sooner than anyone would expect. Quite the contrary, surrogate/synthetic keys are bullet-proof. By definition.
Yarik
+3  A: 

Unless you have an ultra-convenient natural key available, always use a synthetic (a.k.a. surrogate) key of a numeric type. Even if you do have a natural key available, you might want to consider using a synthetic key anyway and placing an additional unique index on your natural key. Consider what happened to higher-ed databases that used social security numbers as PKs when federal law changed, the costs of changing over to synthetic keys were enormous.

Also, I have to disagree with the practice of naming every primary key the same, e.g. "id". This makes queries harder to understand, not easier. Primary keys should be named after the table. For example employee.employee_id, affiliate.affiliate_id, user.user_id, and so on.

Noah Yetter
I agree witht the first point. And I disagree with the second. But the second is probably less critical than the first. Hence an upvote.
Yarik
I agree on both points but the second is purely taste - tabs vs. spaces, /*...*/ vs //...
Nick
A: 

I usually go with a GUID column primary key for all tables (rowguid in mssql). What could be natural keys I make unique constraints. A typical example would be a produkt identification number that the user have to make up and ensure that is unique. If I need a sequence, like in a invoice i build a table to keep a lastnumber and a stored procedure to ensure serialized access. Or a Sequence in Oracle :-) I hate the "social security number" sample for natural keys as that number will never be alway awailable in a registration process. Resulting in a need for a scheme to generate dummy numbers.

Tom
+9  A: 
Philippe Grondier
This is what I have done too. Downside: (a) sometimes you need to make extra joins e.g. you have invoice_line_item.invoice_id but you really want the invoice_number. (b) can be a pain to compare across databases (c) overhead on big tables with few columns (d) can impact ability to partition
WW
There are some "downsides" but the main advantage is that "it allways work" ...
Philippe Grondier
Great explanation - I do the same, then sometimes add a unique constraint/key on whatever would have been the natural key.
Jess