views:

296

answers:

6

I have a table in a SQL 2005 database which is brand new. As part of our application deployment we load the table with about 2.6M rows. Once that is done, the indexes on the table are all rebuilt. Then the users are let into the system and queries against that table time out. I can then rebuild the indexes (using the same exact script that was used after the import) and the queries are lightning fast.

I've checked that there are no other major data changes to the table after the index rebuilds. Any ideas on what else might cause this behavior?

Here's a sample of what the index rebuild script looks like:

DROP INDEX dbo.My_Table.Index1
DROP INDEX dbo.My_Table.Index2

ALTER INDEX PK_My_Table ON dbo.My_Table REBUILD

CREATE NONCLUSTERED INDEX Index1 ON dbo.My_Table (column_1 ASC)
CREATE NONCLUSTERED INDEX Index2 ON dbo.My_Table (column_2 ASC)
A: 

I would assume that something in your import process causes index data to be distributed accross many data pages. Rebuilding them resolves that.

Josip Medved
Thanks for the suggestion. The crux of the issue though, is that it's not solved by the first rebuild. I always have to do it a second time after the users report the problem.
Tom H.
+1  A: 

I suspect that simply adding the indexes the first time is not rebuilding the statistics. Try do a DBCC DBREINDEX on the table after you load it. You may also want to insure that you have a Clustered Index.

RBarryYoung
There is a clustered index on the PK. Thanks for the idea though.
Tom H.
+1  A: 

Statistics, probably, but not on the indexes

The optimiser will pick up the number of changed rows/no stats for the first query. It decides to rebuild/create stats.

However: there may be column level statistics that are not associated with an index.

The 2nd rebuild is irrelevant for stat purposes, because the column stats already exists, but it force the execution plans to be discarded and reevaluated

Edit:

SQLServerPedia:

...Column statistics are not touched by the index rebuild process...

gbn
+1  A: 

Maybe it just takes a really long time to complete the indexing. How long have you waited after the first index rebuilding?

Update: I see it's really an over the weekend-thing, and that implies the indexing just doesn't work right the first time. In that case I don't have any suggestions beyond what's been said so far.

dlamblin
We typically test the import over the weekend, completing by Sunday and late Monday morning it's still an issue. When I run the script it only takes 2 to 5 minutes to complete. Thanks for the suggestion though.
Tom H.
+1  A: 

Drop the indexes before you do your bulk insert of the data. That'll allow the data to be inserted much more quickly. Also disable any triggers on the relevant table(s) before loading your data.

Then, add your indexes. This avoids the unnecessarily redundant index-rebuilding that you're currently doing.

Also, as one user already pointed out, it makes more sense to use DBCC DBREINDEX over dropping & re-adding the indexes. You can also update statistics, of course.

UPDATE: Since DBCC DBREINDEX is deprecated (the command, not the concept), use ALTER INDEX with the REBUILD option.

Garrett
Thanks for the advice. The import isn't a straight bulk insert though.Updating the statistics isn't going to solve the problems caused by large data changes, such as fragmentation, etc.DBCC DBREINDEX is on the chopping block for SQL Server I believe.
Tom H.
A: 

I recall reading somewhere that SQL Server uses the current statistics when creating an index. If the statistics are out of date, the index being created can be optimised for the wrong cases and give poorly performing results.

Try updating the statistics on your table before creating the indices.

The UPDATE STATISTICS entry in BOL indicates that this can happen:

The Database Engine keeps statistics about the distribution of the key values in each index and uses these statistics to determine which index or indexes to use in query processing. Users can create statistics on nonindexed columns by using the CREATE STATISTICS statement. Query optimization depends on the accuracy of the distribution steps:

  • If there is significant change in the key values in the index, rerun UPDATE STATISTICS on that index.
  • If lots of data in an indexed column has been added, changed, or removed (that is, if the distribution of key values has changed), or the table has been truncated by using the TRUNCATE TABLE statement and then repopulated, use UPDATE STATISTICS.

Since you have imported millions of rows into an empty table, I would say that you have hit one of the cases above.

adrianbanks
Thanks for the post. I'll try updating the statistics, but I think that your understanding of the process is a little off. Statistics are not used in the creation of indexes - they're used in the determination of the query plan and which indexes it should use. Index creation relies only on the data which it is indexing.
Tom H.
Yes you are correct. I was mixing up creation with re-indexing. I think what I originally read referred to certain re-indexing commands (that do not update the statistics as well) actually use the current statistics when re-indexing. I cannot find the reference to this now though, so it may be wrong as well.
adrianbanks