views:

45

answers:

1

Hi,

Would like some advice from this. I got a table where I want to keep track of an object and a list of keys related to the object. Example:

OBJECTID   ITEMTYPE   ITEMKEY
--------   --------   -------
1          1          THE
1          1          BROWN
1          2          APPLE
1          3          ORANGE
2          2          WINDOW

Both OBJECTID and ITEMKEY have high selectivity (i.e. the OBJECTID and ITEMKEY are very varied). My access are two ways:

  • By OBJECTID: Each time an object changes, the list of key changes so a key is needed based on OBJECTID. Changes happen frequently.

  • By ITEMKEY: This is for keyword searching and also happens frequently.

So I probably need two keys, and choose one for clustered index (the one that is more frequently accessed, or where I want the speed to be, for now lets assume i will prioritize OBJECTID for clustered). What I am confused about is how I should design it.

My questions is, which is better:

a) A Clustered index of (OBJECTID,ITEMTYPE,ITEMKEY), and then an index of (ITEMKEY). My concern is that since a clustered index is so big (2 ints, 1 string) the index will be big, because all index items got to point back to the clustered key.

b) Create a new column with a running identity DIRECTORYID (integer) as primary key and clustered index, and declare two index for (OBJECTID,ITEMTYPE,ITEMKEY) and just (ITEMKEY). This will minimize index space but have higher lookup costs.

c) A Clustered index of (OBJECTID,ITEMTYPE,ITEMKEY), and a materialized view of (ITEMKEY,ITEMTYPE,OBJECTID) on it. My logic is that this is avoids a key lookup and will still be just as big as the index with a lookup in a), at cost of higher overhead.

d) Err...maybe there is a better way given the requirements?

Thanks in advance, Andrew

+1  A: 

If ever possible, try to keep your clustered key as small as possible, since it will be also added to all non-clustered indices on your table.

Therefore, I would use an INT if ever possible, or possibly a combination of two INT - but certainly never a VARCHAR column - especially if that column is potentially wide (> 10 chars) and is bound to change.

So of the options you present, I personally would choose b) - why??

Adding a surrogate DirectoryID will satisfy all crucial criteria for a clustering key:

  • small
  • stable
  • unique
  • ever-increasing

and your other non-clustered indices will be minimally impacted.

See Kimberly Tripp's outstanding blog post on the main criteria for choosing a good clustering key on your SQL Server tables - very useful and enlightening!

To satisfy your query requirements, I would add two non-clustered indices, one on ObjectID (possibly including other columns frequently needed), and another on ItemKey to search by keyname.

marc_s
Thanks for pointing out the post, it's enlightening (it may seem more intuitive to try to cluster by what is most often used, but from the article, seems like given real world situations, there are other overheads involved that makes it better to follow these rules on the cluster key!)
andrwo
marc_s: I got a question for your opinion: would it make sense to use (OBJECTID,DirectoryID) as the cluster-key? This would at least make it cluster by one criteria, while keeping the cluster key smallish (but would lose the inserting always at end of table property). Is it worth it, would you ever do this in your design?
andrwo
@andrwo: if DirectoryID would be an INT IDENTITY column, then I would cluster on only this single INT - no point in adding a second INT to the clustering index, really - or why would you want to do this??
marc_s
@marc_s: Purely for performance reasons, since the table is huge, figured that it will shortcut the bookmark lookup while accessing by OBJECTID (since if clustered by (DirectoryID) SQL server still need to lookup each item to change it). Actually there are argument both ways depending on what I want to maximize (cluster index fragmentation, space, update performance), but I am just polling generally opinion to see if DB designers typically do this kind of thing, or they usually try their best to stick to the increasing-integer-identity-cluster rule. Thanks for your help so far!
andrwo
@andrwo: if you want to reduce bookmark lookups, create a non-clustered index on the columns you want to search on, and INCLUDE any additional columns that make sense. I wouldn't "pollute" the clustered index which is so crucial to have good performance on with something unless absolutely necessary....
marc_s
@marc_s: Ok got you, thank you for your guidance and opinions on this.
andrwo