SQL Server performance difference with single or multi column primary key?

It depends on your access patterns, read/write ratio and whether (possibly most importantly) the clustered index is defined on the Primary Key.

Rule of thumb is make your primary key as small as possible (32 bit int) and define the clustered index on a monotonically increasing key (think IDENTITY) where possible, unless you have range searches that form a large proportion of the queries against that table.

If your application is write intensive, and you define the clustered index on the GUID column you should note:

All non-clustered indexes will contain the clustered index key and will therefore be larger. This may have a negative effect of performance if there are many NC indexes.
Unless you are using an 'ordered' GUID (such as a COMB or using NEWSEQUENTIALID()), your inserts will fragment the index over time. This means you need a regular index rebuild and possibly increasing the amount of free space left in pages (fill factor)

Because there are many factors at work (hardware, access patterns, data size), I suggest you run some tests and benchmark your particular circumstances..

You will be affected more by (each) component of the key being (a) variable length and (b) the width [wide instead of narrow columns], than the number of components in the key. Unless MS have broken it again in the latest release (they broke Heaps in 2005). Datatype does not slow it down; the width, and particularly variable length (any datatype) does. Note that a fixed len column is made variable if it is set to Nullable. Variable len columns in indices is bad news, because a bit of "unpacking" has to be performed on every access, to get at the data.

Obviously, keep indexed columns as narrow as possible, using fixed, and not Nullable columns only.

In terms of number of columns in a compound key, sure one column is faster than seven, but not that much: three fat wide variable columns are much slower than seven thin fixed columns.

GUID is of course a very fat key; GUID plus anything else is very very fat; GUID Nullable is Guiness material. Unfortunately it is the knee-jerk reaction to solving the IDENTITY problem, which in turn is a consequence of not having chosen good natural relational keys. So you are best advised to fix the real problem at the source, and choose good natural keys; avoid IDENTITY; avoid GUID.

Experience and performance tuning, not conjecture.

My situation is going to be write-heavy with very little reading, with write speed more critical than read speed (although read speed still needs to be ok!).

mrnye 2010-10-25 02:56:03

Would the downvoter please leave a comment and point out exactly what they disagree with. Thanks.

Mitch Wheat 2010-10-25 03:12:55

I'm not the downvoter by the way, but thanks for the additional info. I'm in early stages of new DB design and as such it's pretty hard to get some benchmarks / tests going. I was hoping to get some others experience to help decide on what to do. I will take your comments into consideration, thanks :)

mrnye 2010-10-25 04:16:25

Not the 'downvoter' either. Your statements are correct, but your conclusions are incorrect. Eg. for a GUID, the idea is to purposely obtain a key that approaches "random", and thus obtain distributed Inserts (avoid NextSequential hotspot); therefore one must use FillFactor. I would vote you down.

PerformanceDBA 2010-10-25 22:56:04

@PerformanceDBA: suggest you re-read. I suggest increasing Fill Factor.

Mitch Wheat 2010-10-26 00:47:04

@PerformanceDBA: The primary reason GUIDs are used is so that a middle tier can generate keys without a round trip to Database.

Mitch Wheat 2010-10-26 00:50:58

If you set fillfactor correctly for the expected [GUID, or other random] inserts, the CI will not be fragmented, and you do not need a regular index "rebuild". I did say "your sttmts were correct but..." I am aware of the reasons for GUID, that is just one of them, not primary.

PerformanceDBA 2010-10-26 01:51:36

@PerformanceDBA: your statement "If you set fillfactor correctly for the expected [GUID, or other random] inserts, the CI will not be fragmented" is incorrect. After a certain number of inserts there will be no more room in a page, and it will have to be split. Yes, a GUID is a 'random' insert, but you will still eventually fill pages and require page splits.

Mitch Wheat 2010-10-26 03:05:23

Groan. There is a limit to the length of comments. So far, you're right and I'm right. Yes, of course, there's more, but that discussing that will require even more interaction. I appreciate you and I are both trying to help OP, we are on the same side. If you really think this exchange is warranted, ask a new question, so that we can close this. Let's not hijack this thread.

PerformanceDBA 2010-10-26 04:02:42

what does "Experience and performance tuning, not conjecture." mean?

Mitch Wheat 2010-10-26 03:06:44

"Unless MS have broken it again in the latest release (they broke Heaps in 2005)." - care to elaborate?

Mitch Wheat 2010-10-26 03:07:11

@PerformanceDBA: what and others don't?

Mitch Wheat 2010-10-26 04:11:28

We (as consultants), used to provide advise, based on experience and benchmarks at many sites, re the speed of a Heap vs a well designed CI (specific uses). Heaps were faster for certain operations. We were surprised to find with 2005, MVPs advising that CIs were faster. When we benchmarked again, we identified the truth: the speed of CIs had not changed; MS had broken Heaps, they were then actually slower than CIs for said operations; the "CIs are faster" is just the usual MS marketing spin to cover the reality of negative performance.

PerformanceDBA 2010-10-26 04:13:27

PerformanceDBA 2010-10-26 04:15:34

This is getting boring. [All] "others don't" is your interpretation. I was commenting on specific others that don't. Taking everything personally. Those who get into long exchanges in the margins, forgetting the fact that we are here to answer OP, not for our own personal battles.

PerformanceDBA 2010-10-26 04:22:21

Thanks for your post, it helped a lot. For the record, I have decided upon an int (4byte) + char(4) as a primary key field (with a datetime being added for timeseries tables). The company had previously been using GUIDs for IDs everywhere, but after some thought, it was found that they were complete overkill.

mrnye 2010-10-28 22:56:28

It was my pleasure.

PerformanceDBA 2010-10-29 14:10:24

ansaurus

tags:

views:

answers:

SQL Server performance difference with single or multi column primary key?

related questions