views:

825

answers:

5

I'm considering altering some tables to use nvarchar(50) as primary key instead of an int primary key. Using an int ID for a key really is irrelevant data, it's the string I'm interested in. What sort of performance hit will occur, or where do you research this? Other than cut and try that is.

+3  A: 

Consider using a surrogate key (an int primary key) as the primary key/clustered index key. The trouble with using a nvarchar(50) as a primary key/clustered index key is that your table will be ordered by that key which means it is likely to get highly fragmented, and that any other indexes will have the burden of referencing this heavy primary key.

Another issue is that presumably you are needing to JOIN on other tables by this type of value which is a more expensive operation as the size of the key grows.

I think there are very few situations where an nvarchar(50) primary key would make sense.

Generally, primary keys should be a surrogate UNLESS you have a small natural immutable key. Arguably, SSN, for example, could be considered a natural immutable key.

Hafthor
Kim Tripp talked about this topic in a recent "Run As Radio" topic. http://www.runasradio.com/default.aspx?showNum=76
Hafthor
+1  A: 

For performance, I normally ask the following:

  • how many rows? 1,000 or 1,000,000 or 10,000,000 ??

  • what server is it sitting on? (memory, diskspace)

I would profile it and then see. Normally for me, the bottleneck is not the database, it's poorly written code, badly deployed etc. etc...

Christian Payne
+14  A: 

You have hit upon one of the major "holy wars" of database design. The debate you're referring to is the "surrogate vs. natural key" argument that's been raging for as long as there have been RDBMSs (as nearly as I can tell).

The debate essentially boils down to whether a representative key (surrogate, for example an IDENTITY column) should be used versus using the actual data that uniquely describes a record (natural key).

I will say that there is no "right" answer. Performance measures are an artifact of the platform, and should be assessed by experimentation, but performance is not likely to be the major concern.

What I consider to be the primary argument for surrogate keys is the immutability of primary keys. If you choose to use a natural key, you give up the option of altering that key after it is established. You also give up the possibility that it may become non-unique at some point in the future. For those reasons, I typically (not always) use surrogate keys for most of my tables.

However, as I mentioned, there is a very long-standing debate filled with discussions of indexing strategies and normal-form adherance to be read if you are so inclined.

I would Google "surrogate vs. natural keys". Here are a few links to get you started:

Systems Engineering and RDBMS

Techrepublic

Tony Rogerson's blog

Hope this helps.

Jared
A: 

Why UNICODE? e.g. if I translated an English word into Chinese Han characters, would they be considered to be duplicates?

Why variable? Fixed width is a good physical characteristic of a key.

Why 50 characters? That's a lot of keying for users (I agree "an int ID for a key really is irrelevant data" and think such so-called 'surrogate keys' should never be exposed to end users, BTW).

Also, for me NVARCHAR(50) is a bit of a 'smell': a Microsoft default, a straight port from MS Access, perhaps? This doesn't mean you haven't given due thought and consideration to your key, of course, just one of those things to maybe review.

Oh hang on: you meant specifically PRIMARY KEY, right? Assuming you explicitly use your one (per table) clustered index, AFAIK PRIMARY KEY designation has no physical implications in SQL Server land. Of course, all your candidate keys should be covered by NOT NULL UNIQUE constraints; the one you choose to promote to PRIMARY key is arbitrary.

onedaywhen
A: 

In order to definitely burn out all arguments proposed by the leaders of the natural key solution (cf surrogate vs natural key war), and to make it short, I should say that surrogate keys ALWAYS work, while natural keys have an loosy tendancy to lead to problems and frustration, usually at unexpected times.

I do not say that they are THE optimal solution to every situation, but to avoid loosing your (and other's) time thinking of the proper parameters for the best natural key when creating a table, just choose surrogate, and it's done. And if your table seems to have a proper natural key, just add it as a field with a (unique?) index.

And to make it easy for developers, always have your first field as a primary key, the second one being the supposed/pseudo natural key. Your table should just look like that:

Tbl_whatever
     id_whatever, unique identifier, primary key
     code_whatever, nvarchar(your favorite length), indexed
     .....

Where id_ is the prefix for primary key, and code_ is used for "natural" indexed field

Philippe Grondier