views:

2249

answers:

8

I have a table like so:

keyA keyB data

keyA and keyB together are unique, are the primary key of my table and make up a clustered index.

There are 5 possible values of keyB but an unlimited number of possible values of keyA,. keyB generally increments.

For example, the following data can be ordered in 2 ways depending on which key column is ordered first:

keyA keyB data
A    1    X
B    1    X
A    3    X
B    3    X
A    5    X
B    5    X
A    7    X
B    7    X

or

keyA keyB data
A    1    X
A    3    X
A    5    X
A    7    X
B    1    X
B    3    X
B    5    X
B    7    X

Do I need to tell the clustered index which of the key columns has fewer possible values to allow it to order the data by that value first? Or does it not matter in terms of performance which is ordered first?

A: 

The best thing you can do is to try both solutions and measure the execution time.

In my experience, index tuning is all but exact-science.

Maybe having keyB before keyA in the index column order would be better

Davide Vosti
It is in fact based on concrete scientific ideas. Learning a bit about how b-tree indexes work will make you more informed and require less guess work.
Sam
+1 for being honest. Unless you know exactly how (e.g.) SQL Server works internally, you cannot be sure how things work out in practice.Theory is great though. No, really ;)
Andrew Rollings
+2  A: 

I believe that SQL Server orders it exactly the way you tell it. It assumes that you know best how to access your index.

In any case, I would say it's a good idea where possible to specify what you want exactly rather than hoping the database will figure it out.

You can also try it both ways, run a bunch of representative queries and then compare the generated execution plans to determine which is best for you.

Andrew Rollings
Gave this an upvote, but just want to point out that while it's good to specify what you want in this situation, often times you should let the server figure out what is best. For example, using index hints in queries is generally a bad idea as the best plan can change as your data does.
Tom H.
Agreed. Index hints are evil brute force solutions of last resort. I was referring to creating the index itself both ways and then testing representative queries. (That's what I do, anyway :) )
Andrew Rollings
+3  A: 

If you create an index (regardless clustered or not) with (keyA, keyB) then this is how values will be ordered, e.g. first keyA, then keyB (this is the second case in your question). If you want it the other way around, you need to specify (keyB, keyA).

It could matter performance-wise, depends on your query of course. For example, if you have (keyA, keyB) index and the query looks like WHERE keyB = ... (without mentioning keyA) then the index can't be utilized.

liggett78
A: 

You specify the columns in the order in which you would normally want them sorted in reports and queries.

I would be wary of creating a multicolumn clustered index though. Depending on how wide this is, you could have a huge impact on the size of any other indexes you create because all non-clustered indexes contain the clustered index value in them. Also the rows have to be re-ordered if the values frequently change and it is my experience that non-surrogate keys tend to change more frequently. Therefore creating this as a clustered vice nonclustered index could be much more time consuming of server resources if you have values that are likely to change. I'm not saying you shouldn't do this as I don't know what type of data your columns actually contain (although I suspect they are more complex that A1, a2, etc); I'm saying you need to think about the ramifications of doing it. It would probably be a good idea to thoroughly read BOL about clustered vice nonclustered indexes before committing to doing this.

HLGEM
+1  A: 

As others have said, the ordering is based on how you specify it in the index creation script (or PK constraint). One thing about clustered indexes though is that there is a lot to keep in mind.

You may get better overall performance by using your clustered index on something other than the PK. For example, if you are writing a financial system and reports are almost always based on date and time of an activity (all activity for the past year, etc.) then a clustered index on that date column might be better. As HLGEM says, sorting can also be affected by your selection of clustered index.

Clustered indexes can also affect inserts more than other indexes. If you have a high volume of inserts and your clustered index is on something like an IDENTITY column then there could be contention problems for that particular part of the disk since all of the new rows are being inserted into the same place.

For small look-up tables I always just put the clustered index on the PK. For high-impact tables though it's a good idea to spend the time thinking about (and testing) various possible clustered indexes before choosing the best one.

Tom H.
A: 

Remember that the clustered index is the physical order in which the table is stored on disk.

So if your clustered index is defined as ColA, ColB queries will be faster when order in the same order as your clustered index. If SQL has to order B,A it will require post execution sorting to achieve the correct order.

My suggestion is to add a second non-clustered index on B,A. Also depending on the size of your data column to INCLUDE(read included column) it to prevent the need for key lookups. That is, of course, provided that this table is not heavily inserted, as you always must balance query speed vs. write speed.

Realistically, your clustered index should represent the order in which the data is most likely to be accessed as well as maintaining a delicate balance of insert\update IO cost. If your clustered index is such that you are constantly inserting into the middle of pages, you may suffer performance losses there.

Like others have said, without knowing the table length, column sizes, etc. there is no correct answer. Trial and error with a heavy dose of testing is your best bet.

Brian Rudolph
+6  A: 

You should order your composite clustered index with the most selective column first. This means the column with the most distinct values compared to total row count.

"B*TREE Indexes improve the performance of queries that select a small percentage of rows from a table." http://www.akadia.com/services/ora_index_selectivity.html?

This article is for Oracle, but still relevant.

Also, if you have a query that runs constantly and returns few fields, you may consider creating a composite index that contains all the fields - it will not have to access the base table, but will instead pull data from the index.

ligget78's comment on making sure to mention the first column in a composite index is important to remember.

Sam
+1  A: 

Just in case this isn't obvious: the sort order of your index does not promise much about the the sort order of the results in a query.

In your queries, you must still add an

ORDER BY KeyA, KeyB

or

ORDER BY KeyB, KeyA

The optimizer may be pleased to find the data already physically ordered in the index as desired and save some time, but every query that is supposed to deliver data in a particular order must have an ORDER BY clause at the end of it. Without an order by, SQL Server makes no promises with respect to the order of a recordset, or even that it will come back in the same order from query to query.

Michael Haren