views:

530

answers:

8

I have a 1 TB, 600m row, table which has a misguided choice of indexed columns, specifically a clustered index on the primary key column which is never used in a select query.

I want to remove the clustered index from this row and create it on a number of other rows.

Table is currently like this:

  • colA (PK, nvarchar(3)) [clustered index pt b]

  • colB (PK, bigint) [clustered index pt a]

  • colC (DateTime) [non-clustered index]

  • colD (Money) [non-clustered index]

  • colE (bit) [no index]

  • colF (bit) [no index]

  • colG (int) [no index]

  • more non-indexed columns

I would like to change it to look like this:

  • colA (PK, nvarchar(3)) [clustered index pt a]

  • colB (PK, bigint) [non-clustered index]

  • colC (DateTime) [non-clustered index]

  • colD (Money) [clustered index pt d]

  • colE (bit) [clustered index pt b]

  • colF (bit) [clustered index pt c]

  • colG (int) [clustered index pt e]

  • more non-indexed columns

Two questions: 1) How long would you guesstimate that this change will take (server spec at end of message). Unfortunately it is a live DB and I can't have downtime without some idea of how long it will be down for.

2) Is it a terrible idea to add so many columns to a clustered index? Updates are nearly never performed. There are many inserts and many selects which always use all of the proposed indexed rows as select parameters.

Server spec: 5 x 15kRPM drives in RAID 5, MS-SQL Sever 2005 and some bits to keep them running.

A: 

You may not need to worry about the downtime, as it may be possible to do the change live (without any downtime). Applies to SQL Server 2005 Enterprise edition.

MarkusQ
+3  A: 

You should have a development environment with similar specs that you can use to try this with a copy of the live database.

Brian Ensink
It certainly would be great to have this. I'm looking for sub $7000 answers right now, but may have to go with your suggestion in future.
Mr. Flibble
A: 

One thing you could do if you have the disk space is create a second table with the correct clustered index copy the rows over over to the new table over several days via an incremental process. Once all the rows are there execute sp_rename on both tables (this would require just a few minutes of downtime. If your apps were referencing a view instead of the physical table you could have this done with zero downtime to your apps. I hope this helps.

[Edit] You'll also have to deal with the update to the rows, you need to have a timestamp, or last updated field available on the source table so that you can sync the updates once you have all the rows copied over.

James
+9  A: 

For one thing, I would AVOID making the clustered index wider than it absolutely has to be. Making it into five parts seems about contra-productive. Are ALL the columns in this compound clustered index stable, e.g. never change??

If not, I would avoid them at all costs. A clustered index should be:

  • unique
  • stable
  • as narrow as possible

You can change your non-clustered indices - no problem. But avoid making the clustered index messy! That'll definitely bring down your performance!

Check out Kimberly Tripp's excellent blog articles on indexing:

Marc

marc_s
This is not always true. If the table is read heavy (which his is) than a wider clustered index specifically on fields that are used for search args would actually increase query performance per the purpose of his proposed new clustered index).
James
Possibly - but one thing a lot of folks don't consider: the whole clustered key will also be part of any non-clustered key (all the leaf nodes of the non-clustered index) and could thus bloat the space needed.
marc_s
Those Kimberly articles are great. Thank you. Is seems that there are so many intricacies to indexing that I really need to post a full spec of my problem to get a solid answer. I'll try and get this done over the weekend and re-post.
Mr. Flibble
Yes, indeed - indexing is a lot more and a lot trickier than one would believe at first :-)
marc_s
A: 

1) How long would you guesstimate that this change will take (server spec at end of message). Unfortunately it is a live DB and I can't have downtime without some idea of how long it will be down for.

It really, really depends on the data. Just the table parameters alone doesn't provide enough information. Could be a few minutes (unlikely) to a few days (unlikely) with the likeliest time being somewhere inbetween.

2) Is it a terrible idea to add so many columns to a clustered index? Updates are nearly never performed. There are many inserts and many selects which always use all of the proposed indexed rows as select parameters.

No, that should not pose any problems. Performance should only improve if you are making few updates. When those updates occur, it'll take awhile to fix the index, though, and performance will suffer during that time, which will vary depending on the data.

Adam Davis
+2  A: 

While changing the clustered index sounds like it would certainly help here, why don't you try adding a (nonclustered) covering index first?

Shouldn't take the table down while the new index is built, and should give you an indication of what performance improvement (if any) will result in this reorganization.

BradC
Unless this is SQL 2005 Enterprise Edition (which he doesn't say) creating a new index is an offline operation.
mrdenny
A: 

I agree with Brian, you should have a test database with same amount of data and run the index change. But, I presume that you are making this change because you think it will speed up the queries. You should run benchmarks test (before and after the index change) and ensure that your optimization doesn't become a pessimization.

Nathan Koop
+5  A: 

I made the changes and it didn't take too long. Here are the times for each operation, first time is when run on a backup server with a single 7200RPM drive, and the second on the main server with 15k drives in RAID.

ALTER TABLE Table DROP CONSTRAINT [PK_Table]

2:39 hrs / 19 minutes

CREATE CLUSTERED INDEX [IX_Clustered] ON [Table] 
(
 [a] ASC,
 [b] ASC,
 [c] ASC,
 [d] ASC,
 [e] ASC,
 [f] ASC
)WITH (PAD_INDEX  = ON, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, IGNORE_DUP_KEY = OFF, FILLFACTOR = 90, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = OFF) ON [PRIMARY]

15:30 hrs / 2 hrs

ALTER TABLE Table ADD CONSTRAINT
PK_hands PRIMARY KEY NONCLUSTERED 
(
 e,
 h
) WITH( STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

4 hrs / 1 hr

The select query most often used now takes < 10 seconds where it often took 10 to 15 minutes before. Nice improvement! Insert times seem a bit faster too.

Mr. Flibble