views:

77

answers:

3

Is there any difference in performance (in terms of inserting/updating & querying) a table if the primary key is a single column (e.g., a GUID generated for every row) or multiple columns (e.g., a foreign key GUID + an offset number)?

I would assume querying speeds should be quicker if anything with multi-column primary keys, however I would imagine inserting would be slower due to a slightly more complicated unique check? I also imagine the data types of a multi-column primary key could also matter (e.g., if one of the columns was a DateTime type it would add complexity). These are just my thoughts to invoke answers & discussion (hopefully!) and are not fact based.

I realise there are some other questions covering this topic, but I'm wondering about performance impacts rather than management/business concerns.

+1  A: 

It depends on your access patterns, read/write ratio and whether (possibly most importantly) the clustered index is defined on the Primary Key.

Rule of thumb is make your primary key as small as possible (32 bit int) and define the clustered index on a monotonically increasing key (think IDENTITY) where possible, unless you have range searches that form a large proportion of the queries against that table.

If your application is write intensive, and you define the clustered index on the GUID column you should note:

  1. All non-clustered indexes will contain the clustered index key and will therefore be larger. This may have a negative effect of performance if there are many NC indexes.

  2. Unless you are using an 'ordered' GUID (such as a COMB or using NEWSEQUENTIALID()), your inserts will fragment the index over time. This means you need a regular index rebuild and possibly increasing the amount of free space left in pages (fill factor)

Because there are many factors at work (hardware, access patterns, data size), I suggest you run some tests and benchmark your particular circumstances..

Mitch Wheat
My situation is going to be write-heavy with very little reading, with write speed more critical than read speed (although read speed still needs to be ok!).
mrnye
Would the downvoter please leave a comment and point out exactly what they disagree with. Thanks.
Mitch Wheat
I'm not the downvoter by the way, but thanks for the additional info. I'm in early stages of new DB design and as such it's pretty hard to get some benchmarks / tests going. I was hoping to get some others experience to help decide on what to do. I will take your comments into consideration, thanks :)
mrnye
Not the 'downvoter' either. Your statements are correct, but your conclusions are incorrect. Eg. for a GUID, the idea is to purposely obtain a key that approaches "random", and thus obtain distributed Inserts (avoid NextSequential hotspot); therefore one must use FillFactor. I would vote you down.
PerformanceDBA
@PerformanceDBA: suggest you re-read. I suggest increasing Fill Factor.
Mitch Wheat
@PerformanceDBA: The primary reason GUIDs are used is so that a middle tier can generate keys without a round trip to Database.
Mitch Wheat
If you set fillfactor correctly for the expected [GUID, or other random] inserts, the CI will not be fragmented, and you do not need a regular index "rebuild". I did say "your sttmts were correct but..." I am aware of the reasons for GUID, that is just one of them, not primary.
PerformanceDBA
@PerformanceDBA: your statement "If you set fillfactor correctly for the expected [GUID, or other random] inserts, the CI will not be fragmented" is incorrect. After a certain number of inserts there will be no more room in a page, and it will have to be split. Yes, a GUID is a 'random' insert, but you will still eventually fill pages and require page splits.
Mitch Wheat
Groan. There is a limit to the length of comments. So far, you're right and I'm right. Yes, of course, there's more, but that discussing that will require even more interaction. I appreciate you and I are both trying to help OP, we are on the same side. If you really think this exchange is warranted, ask a new question, so that we can close this. Let's not hijack this thread.
PerformanceDBA
A: 

It depends on the indexing and storage in each case. All other things being equal, the choice of primary key is irrelevant as far as performance is concerned. The choice of indexes and other storage options would be the deciding factor.

dportas
+2  A: 

You will be affected more by (each) component of the key being (a) variable length and (b) the width [wide instead of narrow columns], than the number of components in the key. Unless MS have broken it again in the latest release (they broke Heaps in 2005). Datatype does not slow it down; the width, and particularly variable length (any datatype) does. Note that a fixed len column is made variable if it is set to Nullable. Variable len columns in indices is bad news, because a bit of "unpacking" has to be performed on every access, to get at the data.

Obviously, keep indexed columns as narrow as possible, using fixed, and not Nullable columns only.

In terms of number of columns in a compound key, sure one column is faster than seven, but not that much: three fat wide variable columns are much slower than seven thin fixed columns.

GUID is of course a very fat key; GUID plus anything else is very very fat; GUID Nullable is Guiness material. Unfortunately it is the knee-jerk reaction to solving the IDENTITY problem, which in turn is a consequence of not having chosen good natural relational keys. So you are best advised to fix the real problem at the source, and choose good natural keys; avoid IDENTITY; avoid GUID.

Experience and performance tuning, not conjecture.

PerformanceDBA
what does "Experience and performance tuning, not conjecture." mean?
Mitch Wheat
"Unless MS have broken it again in the latest release (they broke Heaps in 2005)." - care to elaborate?
Mitch Wheat
@PerformanceDBA: what and others don't?
Mitch Wheat
We (as consultants), used to provide advise, based on experience and benchmarks at many sites, re the speed of a Heap vs a well designed CI (specific uses). Heaps were faster for certain operations. We were surprised to find with 2005, MVPs advising that CIs were faster. When we benchmarked again, we identified the truth: the speed of CIs had not changed; MS had broken Heaps, they were then actually slower than CIs for said operations; the "CIs are faster" is just the usual MS marketing spin to cover the reality of negative performance.
PerformanceDBA
PerformanceDBA
This is getting boring. [All] "others don't" is your interpretation. I was commenting on specific others that don't. Taking everything personally. Those who get into long exchanges in the margins, forgetting the fact that we are here to answer OP, not for our own personal battles.
PerformanceDBA
Thanks for your post, it helped a lot. For the record, I have decided upon an int (4byte) + char(4) as a primary key field (with a datetime being added for timeseries tables). The company had previously been using GUIDs for IDs everywhere, but after some thought, it was found that they were complete overkill.
mrnye
It was my pleasure.
PerformanceDBA