ansaurus

Question

Should I create a unique clustered index, or non-unique clustered index on this SQL 2005 table?

Answer 1

+3 A:

First off, I would definitely recommend to have a clustered index!

Secondly, your clustered index should be:

narrow
static (never or hardly ever change)
unique
ever-increasing

so an INT IDENTITY is a very well thought out choice.

When your clustering key is not unique, SQL Server will add a 4-byte uniqueifier to those column values - thus making your clustering key and with it all non-clustered indices on that table larger and less optimal.

So in your case, I would pick the ID - it's narrow, static, unique and ever-increasing - can't be more optimal than that! Since the Sequence is used heavily in UPDATE statements, definitely put a non-clustered index on it, too!

See Kimberly Tripp's excellent blog posts on choosing the right clustering key for great background info on the topic.

marc_s 2010-04-30 21:11:30

Good answer. I saw the banner saying an answer was added as I was hitting submit.

TimothyAWiseman 2010-04-30 21:15:33

Thanks. I'm going to read that link this weekend. I want to make an informed decision. I should state that my "Sequence" clustered index would be:•narrow (It's an int)•static (never changes)•almost unique (very limited duplicates 10-20% of all records at most and limited to under 5 rows for each duplicate)•ever-increasingIs this an exception to the rule, considering what we are doing?

Bremer 2010-04-30 21:37:10

@Bremer: if your clustering key is not ever-increasing, you'll have to deal with page splits when you insert a new row into the middle of a full page --> not so good for performance.

marc_s 2010-05-01 07:08:13

After reading all the great suggestions here, I went with the ID column. It would seem that overall, reducing page splits and database fragmentation would be the best result for the system as a whole, even though one app using the Sequence column might suffer slightly. Thanks everyone for the excellent information.

Bremer 2010-05-03 13:58:45

Answer 2

+2 A:

As a general rule, you want your clustered index to be unique. If it is not, SQL Server will in fact add a hidden "uniquifier" to it to force it to be unique, and this adds overhead.

So, you are probably best using the ID column as your index.

Just as a side note, using a identity column as your primary key is normally referred to as a surrogate key since it is not inherent in your data. When you have a unique natural key available that is probably a better choice. In this case it looks like you do not, so using the unique surrogate key makes sense.

TimothyAWiseman 2010-04-30 21:14:31

I know that this is the general recommendation, but I see a unique case here. Is this scenario possibly one of the onces where the "general rule" does not apply?

Bremer 2010-04-30 21:31:19

I do not see why this case is unique. As to whether the general rule applies, from what has been described so far, I would say yes. Really determining this in detail would require extensive testing with your exact application, all of it. But all details so far would indicate that the ID column is the way to go.

TimothyAWiseman 2010-05-04 23:34:51

Answer 3

+1 A:

The worst thing about the inserts out of order is page splits.

When SQL Server needs to insert a new record into an existing index page and finds no place there, it takes half the records from the page and moves them into a new one.

Say, you have these records filling the whole page:

1 2 3 4 5 6 7 8 9

and need to insert a 10. In this case, SQL Server will just start the new page.

However, if you have this:

1 2 3 4 5 6 7 8 11

, 10 should go before 11. In this case, SQL Server will move records from 6 to 11 into the new page:

6 7 8 9 10 11

The old page, as it can be easily seen, will remain half filled (only records from 1 to 6 will go there which are very).

This will increase the index size.

Let's create two sample tables:

CREATE TABLE perfect (id INT NOT NULL PRIMARY KEY, stuffing VARCHAR(300))
CREATE TABLE almost_perfect (id INT NOT NULL PRIMARY KEY, stuffing VARCHAR(300))

;
WITH    q(num) AS
        (
        SELECT  1
        UNION ALL
        SELECT  num + 1
        FROM    q
        WHERE   num < 200000
        )
INSERT
INTO    perfect
SELECT  num, REPLICATE('*', 300)
FROM    q
OPTION (MAXRECURSION 0)

;
WITH    q(num) AS
        (
        SELECT  1
        UNION ALL
        SELECT  num + 1
        FROM    q
        WHERE   num < 200000
        )
INSERT
INTO    almost_perfect
SELECT  num + CASE num % 5 WHEN 0 THEN 2 WHEN 1 THEN 0 ELSE 1 END, REPLICATE('*', 300)
FROM    q
OPTION (MAXRECURSION 0)

EXEC sp_spaceused N'perfect'
EXEC sp_spaceused N'almost_perfect'

perfect         200000   66960 KB    66672 KB    264 KB  24 KB
almost_perfect  200000   128528 KB   128000 KB   496 KB  32 KB

Even with only 20% probability of the records being out of order, the table becomes twice as large.

On the other hand, having a clustered key on Sequence will reduce the I/O twice (since it can be done with a single clustered index seek rather than two unclustered ones).

So I'd take a sample subset of your data, insert it into the test table with a clustered index on Sequence and measure the resulting table size.

If it less than twice the size of the same table with an index on ID, I'd go for the clustered index on Sequence (since the total resulting I/O will be less).

If you decide to create a clustered index on Sequence, make ID an unclustered PRIMARY KEY and make the clustered index UNIQUE on Sequence, ID. This will use a meaningful ID instead of opaque uniquiefier.

Quassnoi 2010-04-30 22:22:27

ansaurus

tags:

views:

answers:

Should I create a unique clustered index, or non-unique clustered index on this SQL 2005 table?

related questions