views:

50

answers:

3

Guys,

Here is the scenario:

SQL Server 2000 (8.0.2055)

Table currently has 478 million rows of data. The Primary Key column is an INT with IDENTITY. There is an Unique Constraint imposed on two other columns with a Non-Clustered Index. This is a vendor application and we are only responsible for maintaining the DB.

Now the vendor has recommended doing the following "to improve performance"

  1. Drop the PK and Clustered Index
  2. Drop the non-clustered index on the two columns with the UNIQUE CONSTRAINT
  3. Recreate the PK, with a NON-CLUSTERED index
  4. Create a CLUSTERED index on the two columns with the UNIQUE CONSTRAINT

I am not convinced that this is the right thing to do. I have a number of concerns.

By dropping the PK and indexes, you will be creating a heap with 478 million rows of data. Then creating a CLUSTERED INDEX on two columns would be a really mammoth task. Would creating another table with the same structure and new indexing scheme and then copying the data over, dropping the old table and renaming the new one be a better approach?

I am also not sure how the stored procs will react. Will they continue using the cached execution plan, considering that they are not being explicitly recompiled.

I am simply not able to understand what kind of "performance improvement" this change will provide. I think that this will actually have the reverse effect.

All thoughts welcome.

Thanks in advance,

Raj

+1  A: 

All stored procs will recompile automatically. This will happen anyway when stats change and after index maintenance anyway.

At some point, you have to reorganise 478 million rows (drop/create indexes) or move (new table). Neither way is better then the other, unfortunately. I feel your pain though: we have similarly large tables with pending new columns and an index changes.

Saying that, you should do it step 2-1-4-3 to avoid unnecessary non-clustered index maintenance when you drop/create the clustered index.

And drop the unique constraint. The clustered index could be unique and clustered. A unique constrint is just another index that would be unnecessary.

As for the performance benefit, perhaps ask the vendor why.

gbn
Thanks. Which would be a better approach from a downtime perspective?
Raj
I'd be inclined to drop/create indexes... however with 478 million rows you may as well just do it rather that test it, perhaps :-)
gbn
Make sure you have enough diskspace: 2 x table space or so (rebuilding a clustered index will copy data from... to...), 1.5 to 2 x index space (set sorting in tempdb if you can), and I think there's a hit on the transaction log file as well. If you have a testing setup big enough, try it out there first, if only to estimate what your Production downtime will be like.
Philip Kelley
A: 

Would creating another table with the same structure and new indexing scheme and then copying the data over, dropping the old table and renaming the new one be a better approach?

I believe that this is what SQL Enterprise Manager will do behind the scenes anyways if you use the visual tools to do this. If you make a schema change such as add a column in the middle of a table, or change primary keys, there is a little button that will allow you to "Script Changes". If you view this script, you can see the steps that Enterprise Manager will take in order to do what you requested.

Jeremy
Creating a new table and copying over the data a chunk at a time can be a valid option if you have space issues (and I'm talking the transaction log here), but it's tricky to write and will take longer to run. And to mention, I've found that when you're familiar with your schema, data, and hardware, its not too hard to write better code than the one-size-fits-all stuff churned out by EM and SSMS.
Philip Kelley
+1  A: 

The one thing I would have a serious look at is the question what type those other two columns are - how big are they, compared to the INT IDENTITY (4 byte) ??

The reason I ask: the clustering key will be added to all non-clustered indices on the table, too - and if you have close to 500 million rows, it will make a huge difference whether the clustering key is a single 4-byte INT, or e.g. two 16-byte GUID's.

This is not only on disk, mind you - the pages are loaded into SQL Server's RAM in their entirety - so by potentially bloating up your clustering key, you'd incur performance penalties due to the larger number of pages on disk (and in RAM) that your non-clustered indices would need.

The only compelling reason I could see to actually go through with those changes would be if by clustering the table using those two other columns, you'd gain something in terms of query performance, e.g. if some of the most frequent queries would be faster, due to the fact that the table is now clustered by these two columns. That's really hard to know unless you know what the access and query patterns really are....

marc_s
One of the columns is int and the other one is datetime.
Raj