views:

927

answers:

5

I have a number of indexes on some tables, they are all similar and I want to know if the Clustered Index is on the correct column. Here are the stats from the two most active indexes:

Nonclustered
I3_Identity (bigint)
rows: 193,781
pages: 3821
MB: 29.85
user seeks: 463,355
user_scans: 784
user_lookups: 0
updates: 256,516

Clustered Primary Key
I3_RowId (varchar(80))
rows: 193,781
pages: 24,289
MB: 189.76
user_seeks: 2,473,413
user_scans: 958
user_lookups: 463,693
updates: 2,669,261

As you can see, the PK is being seeked often, but all the seeks for the i3_identity column are doing key lookups to this PK as well, so am I really benefiting from the index on I3_Identity much at all? Should I change to using the I3_Identity as the clustered? This could have a huge impact as this table structure is repeated about 10000 times where I work, so any help would be appreciated.

+1  A: 

From what I've read in the past, two of the most important measures with regards to indexing tables are the number of queries performed against the index and the index density. By using DBCC_SHOWSTATISTICS([table],[index]), you can examine index density. The idea is that you want your clustered index on the columns that provide the most distinctness per query.

In short, if you look at the "All density" measure from DBCC SHOW_STATISTICS and notice the number is very low, this is a good index to cluster. It makes logical sense to cluster on an index that provides more uniqueness, but only if it's actively queried against. Clustering on a seldom-used index will probably do more harm than good.

In the end it's a judgment call. You may want to talk with your DBA and analyze your code to see where you'll get the biggest benefit. In this limited example, your indexing seems to be clustered in the right area if you only consider usage (and even when you consider all density, given the fact that a primary key provides the most uniqueness you can muster.)

Edit: There's a pretty good article on MSDN that explains what SHOW_STATISTICS provides you. I'm certainly not an uber DBA, but most of the information I've provided here came from guidance given by our DBA :)

Here's the article: http://msdn.microsoft.com/en-us/library/ms174384.aspx

Scott Anderson
+2  A: 

Here's the best discussion I've found about the topic. Kimberly Tripp is a MS blogger that stays on top of the debate. I could interpret it for you, but you obviously uonderstand the basic words and concepts, and the article is highly readable. So enjoy!

Hint: you'll find that short answers are almost always too simplistic.

le dorfier
+2  A: 

quick 'n dirty:

Put the clustered index on:

  • a column who's values (almost) never change

  • a column for which values on new records increase / decrease sequentially

  • a column where you perform range - searches

Frederik Gheysels
+1  A: 

Generally, when I see key lookups to the primarykey/clustered key, it means I need to include (using the INCLUDE statement) more columns in the non-clustered key. Look at your queries and see what columns are being selected/used in those statements. If you include those columns in the non-clustered key, then it won't need to do the key lookup any more.

Al W
+5  A: 

Frederik sums it up nicely, and that's really what Kimberly Tripp also preaches: the clustering key should be stable (never changes), ever increasing (IDENTITY INT), small and unique.

In your scenario, I'd much rather put the clustering key on the BIGINT column rather than the VARCHAR(80) column.

First of all, with the BIGINT column, it's reasonably easy to enforce uniqueness (if you don't enforce and guarantee uniqueness yourself, SQL Server will add a 4-byte "uniquefier" to each and every one of your rows) and it's MUCH smaller on average than a VARCHAR(80).

Why is size so important? The clustering key will also be added to EACH and every one of your non-clustered indexes - so if you have a lot of rows and a lot of non-clustered indexes, having 40-80 byte vs. 8 byte can quickly make a HUGE difference.

Also, another performance tip: in order to avoid the so-called bookmark lookups (from a value in your non-clustered index via the clustering key into the actual data leaf pages), SQL Server 2005 has introduced the notion of "included columns" in your non-clustered indexes. Those are extremely helpful, and often overlooked. If your queries often require the index fields plus just one or two other fields from the database, consider including those in order to achieve what is called "covering indexes". Again - see Kimberly Tripp's excellent article - she's the SQL Server Indexing Goddess! :-) and she can explain that stuff much better than I can...

So to sum it up: put your clustering key on a small, stable, unique column - and you'll do just fine!

Marc

marc_s