views:

330

answers:

3

There is a requirement to use GUID(s) as primary keys. Am I right in thinking that

ProductID UNIQUEIDENTIFIER NOT NULL 
ROWGUIDCOL DEFAULT (NEWSEQUNTIALID()) PRIMARY KEY CLUSTERED

will give the fastest select for where clause

productid in ( guid1 , guid2 ,..., guidn )

and doesn't deteriorate non-clustered

natural_key like 'Something*'

independent select. Table for querying only by users and created/recreated programmatically from scratch.

A: 

A Clustered index is best suited to range searches, so it might satisfy your query:

productid in ( guid1 , guid2 ,..., guidn )

but depends what else you are selecting, grouping by, ordering by etc if the index is to be a covering index. Otherwise another non-clustered index might be picked by the optimiser followed by a lookup into the clustered index. It also depends to some extent on the number of rows in that table.

Also, I think you might want to use NEWID() as oppose to NEWSEQUENTIALID()

Mitch Wheat
A disparate list of distinct values in a IN statement isn't exactly a range query....
marc_s
@marc_s: that's a good point! But they are sequential GUIDs, as posed in the original question, so I think they might occur as a range.
Mitch Wheat
Original sequence is random as returned from Lucene.Net full text index , but if it speeds up a query , no problem to sort GUIDs in memory.
MicMit
+2  A: 

The fact you're using GUID's as a clustered index will most definitely negatively impact your performance. Even with the NEWSEQUENTIALGUID, the GUIDs aren't really sequential - they're only partially so. Their randomness by nature will definitely lead to higher index fragmentation and thus to less optimal seek times.

Additionally, if you have a 16-byte GUID as your clustered key, it will be added to any non-clustered index on that table. That might not sound so bad, but if you have 10 mio. rows, 10 non-clustered indices, using a 16-byte GUID vs. a 4-byte INT will cost you 1.2 GByte of storage wasted - and not just on disk (which is cheap), but also in your SQL server's memory (since SQL server always loads entire 8k pages into 8k blocks of memory, no matter how full or empty they are).

I can see the point of using a GUID as a primary key - they're almost 100% guarantee to be unique is appealing to developers. BUT: as a clustered key, they're a nightmare for your database.

My best practice: if I really need a GUID as primary key, I add a 4-byte INT IDENTITY to the table which then serves as the clustered key - the results are way better that way!

If you have a non-clustered primary key, your queries using list of GUIDs will be just as fast as if it where a clustered primary key, and by not using GUIDs for your clustered key, your table will perform even better in the end.

Read up more on clustered key and why it's so important to pick the right one in Kimberly Tripps' blog - the the Queen of Indexing and can explain things much better than I do:

Marc

marc_s
May we say that in general if GUID is primary key it should be non-clustered, having it clustered doesn't give any benefit for select in query. In case it was omitted for my application the table is recreated from scratch each time , users just query it once it is ready.
MicMit
+1  A: 

As well as GUIDs being bad (answer from marc_s), you also have an IN clause. This beaks down to:

productid = guid1 OR productid = guid2 OR ... OR productid = guidn

...in practice, which is not optimal either.

Generally, natural_key like 'Something%' will most likely be better for a clustered index on your natrual key column.

gbn
for queries like these, I like to split the IN csv list into a table and then just join to it so it will use an index.
KM
Thanks, I use the same idea if I have to use CSV
gbn