views:

59

answers:

3

This question is about what happens with the reorganizing of data in a clustered index when an insert is done. I assume that it should be more expensive to do inserts on a table which has a clustered index than one that does not because reorganizing the data in a clustered index involves changing the physical layout of the data on the disk. I'm not sure how to phrase my question except through an example I came across at work.

Assume there is a table (Junk) and there are two queries that are done on the table, the first query searches by Name and the second query searches by Name and Something. As I'm working on the database I discovered that the table has been created with two indexes, one to support each query, like so:

--drop table Junk1
CREATE TABLE Junk1
(
    Name char(5),  
    Something char(5),
    WhoCares int
)

CREATE CLUSTERED INDEX IX_Name ON Junk1
(
    Name
)

CREATE NONCLUSTERED INDEX IX_Name_Something ON Junk1
(
    Name, Something
)

Now when I looked at the two indexes, it seems that IX_Name is redundant since IX_Name_Something can be used by any query that desires to search by Name. So I would eliminate IX_Name and make IX_Name_Something the clustered index instead:

--drop table Junk2
CREATE TABLE Junk2
(
    Name char(5),  
    Something char(5),
    WhoCares int
)

CREATE CLUSTERED INDEX IX_Name_Something ON Junk2
(
    Name, Something
)

Someone suggested that the first indexing scheme should be kept since it would result in more efficient inserts/deletes (assume that there is no need to worry about updates for Name and Something). Would that make sense? I think the second indexing method would be better since it means one less index needs to be maintained.

I would appreciate any insight into this specific example or directing me to more info on maintenance of clustered indexes.

+7  A: 

Yes, inserting into the middle of an existing table (or its page) could be expensive when you have a less than optimal clustered index. Worst case would be a page split : half the rows on the page would have to be moved elsewhere, and indices (including non-clustered indices on that table) need to be updated.

You can alleviate that problem by using the right clustered index - one that ideally is:

  • narrow (only a single field, as small as possible)
  • static (never changes)
  • unique (so that SQL Server doesn't need to add 4-byte uniqueifiers to your rows)
  • ever-increasing (like an INT IDENTITY)

You want a narrow key (ideally a single INT) since each and every entry in each and every non-clustered index will also contain the clustering key(s) - you don't want to put lots of columns in your clustering key, nor do you want to put things like VARCHAR(200) there!

With an ever increasing clustered index, you will never see the case of a page split. The only fragmentation you could encounter is from deletes ("swiss cheese" problem).

Check out Kimberly Tripp's excellet blog posts on indexing - most notably:

Assume there is a table (Junk) and there are two queries that are done on the table, the first query searches by Name and the second query searches by Name and Something. As I'm working on the database I discovered that the table has been created with two indexes, one to support each query, like so:

That's definitely not necessary - if you have one index on (Name, Something), that index can also and just as well be used if you search and restrict on just WHERE Name = abc - having a separate index with just the Name column is totally not needed and only wastes space (and costs time to be kept up to date).

So basically, you only need a single index on (Name, Something), and I would agree with you - if you have no other indices on this table, then you should be able to make this the clustered key. Since that key won't be ever-increasing and could possibly change, too (right?), this might not be such a great idea.

The other option would be to introduce a surrogate ID INT IDENTITY and cluster on that - with two benefits:

  • it's all a good clustered key should be, including ever-increasing -> you'll never have any issues with page splits and performance for INSERT operations
  • you still get all the benefits of having a clustering key (see Kim Tripps' blog posts - clustered tables are almost always preferable to heaps)
marc_s
Nice, thorough explanation.
Joe Stefanelli
A: 

Someone suggested that the first indexing scheme should be kept since it would result in more efficient inserts/deletes

That's a bogus claim. Ordered data is ordered data and the same IO would be performed.

SET STATISTICS IO ON
-- your insert statement here
David B
A: 

You can create a clustered index only on one column, not two or more so choose the column which your app will mostly be querying on, like wildcard queries on customer fullnames, etc. (see discussion)

Frank Computer
That is false, please read:http://msdn.microsoft.com/en-us/library/aa933131(SQL.80).aspx"a table can contain only one clustered index. However, the index can comprise multiple columns"
Anssssss