views:

276

answers:

2

Using SQL Server 2005. This is something I've noticed while doing some performance analysis.

I have a large table with about 100 million rows. I'm comparing the performance of different indexes on the table, to see what the most optimal is for my test scenario which is doing about 10,000 inserts on that table, among other things on other tables. While my test is running, I'm capturing an SQL Profiler trace which I load in to an SQL table when the test has finished so I can analyse the stats.

The first test run after recreating a different set of indexes on the table is very noticeably slower than subsequent runs - typically about 10-15 times slower for the inserts on this table on the first run after the index creation.

Each time, I clear the data and execution plan cache before the test.

What I want to know, is the reason for this initial poorer performance with a newly created set of indexes? Is there a way I can monitor what is happening to cause this for the first run?

+3  A: 

One possibility is that the default fill factor of zero is coming in to play.

This means that there's 'no room' in the index to accommodate your inserts. When you insert, a page split in the index is needed, which adds some empty space to store the new index information. As you carry out more inserts, more space is created in the index. After a while the rate of splitting will go down, because your inserts are hitting pages that are not fully filled, so splits are not needed. An insert requiring page splits is more expensive than one that doesn't.

You can set the fill factor when you create the index. Its a classic trade off between space used and performance of different operations.

I'm going go include a link to some Sybase ASE docs, 'cos they are nicely written and mostly applicable to SQL Server too.

martin clayton
For some odd reason I didn't even think about fill factor :/That's definitely something I will test out - I think you could be right. Ideally, I'd like to see these page splits occurring so I can actually - any idea how to monitor for them?
AdaTheDev
Built in is perfmon, which instruments the server (so slows things down) but captures loads of data including page splits/sec.
martin clayton
Cheers - will give that a whirl - I like to "see" things happening to directly relate them to spikes in performance!
AdaTheDev
Spot on cheers, I've been able to monitor the page splits and really see their effect.
AdaTheDev
+1  A: 

Just to clarify:

1) You build an index on a table with 100m pre-existing rows.

2) You insert 10k rows into the table

3) You insert another 10k rows into the table

Step 3 is 10x faster than step 2?

What kind of index is the new index - not clustered, right? Because inserts on a clustered index will cause very different behavior. In addition, is there any significant difference in the profile of the 2 inserts, because depending on the clustered index, they will have different behavior. Typically, it should either have no clustered index or be clustered on an increasing key.

Cade Roux
Yep, correct on those 3 steps. I *am* trying different clustered indexes. I run some tests with the clustered index on FieldA. Then I drop it, and create a clustered index on a FieldB to see how that performs in comparison. I know they will have different behaviour - it's more that the first test run after changing the index, it's noticeably slower than subsquent test runs with the same index.
AdaTheDev
Well, a clustered index is not really an index. It's just the ordering of the data pages for the rows. When you have a clustered index, you really would prefer that it not cause a lot of page splits, so usually it's easiest just to ensure that the clustering matches the insertion order - or simply have a heap table with only non-clustered indexes (perhaps with included columns to better cover your reading needs). Remember, all indexes really only slow down INSERTs.
Cade Roux
Yes, part of the reason I'm doing this testing is to ensure we have only the indexes really needed in order to get the right balance between insert/update speed and read speed. Good suggestion about considering a heap table with only nonclustered indexes - that could well benefit me - will try that out too.
AdaTheDev