views:

226

answers:

2

I am struggling understanding what a clustered index in SQL Server 2005 is. I read the MSDN article Clustered Index Structures (among other things) but I am still unsure if I understand it correctly.

The (main) question is: what happens if I insert a row (with a "low" key) into a table with a clustered index?

The above mentioned MSDN article states:

The pages in the data chain and the rows in them are ordered on the value of the clustered index key.

And Using Clustered Indexes for example states:

For example, if a record is added to the table that is close to the beginning of the sequentially ordered list, any records in the table after that record will need to shift to allow the record to be inserted.

Does this mean that if I insert a row with a very "low" key into a table that already contains a gazillion rows literally all rows are physically shifted on disk? I cannot believe that. This would take ages, no?

Or is it rather (as I suspect) that there are two scenarios depending on how "full" the first data page is.

  • A) If the page has enough free space to accommodate the record it is placed into the existing data page and data might be (physically) reordered within that page.
  • B) If the page does not have enough free space for the record a new data page would be created (anywhere on the disk!) and "linked" to the front of the leaf level of the B-Tree?

This would then mean the "physical order" of the data is restricted to the "page level" (i.e. within a data page) but not to the pages residing on consecutive blocks on the physical hard drive. The data pages are then just linked together in the correct order.

Or formulated in an alternative way: if SQL Server needs to read the first N rows of a table that has a clustered index it can read data pages sequentially (following the links) but these pages are not (necessarily) block wise in sequence on disk (so the disk head has to move "randomly").

How close am I? :)

+1  A: 

How close are you? Very!

These articles may help consolidate your understanding:

http://msdn.microsoft.com/en-us/library/aa964133(SQL.90).aspx

http://www.sql-server-performance.com/articles/per/index_fragmentation_p1.aspx

Daniel Renshaw
+1  A: 

If you happen to insert a row with a "low" ID as you say, then yes - it will be placed in the vicinity of your other rows that are already there with similar ID's.

If your SQL Server page (8K chunks) is filled to the max, then a page split will occur - half the rows will remain on that page, and the other half will be moved to a new page. These two new pages will now have some capacity for new row.

That's one of the reasons why you don't want to use something as your clustering key that is very random, e.g. a GUID, which will cause rows to the inserted all over the place.

Trying to avoid page splits (which are quite expensive operations) is one of the main reasons why gurus like Kimberly Tripp heavily advocate using something that is ever increasing as your clustering key - e.g. an INT IDENTITY column. Here, a new value is always guaranteed to be larger than anything that's already in your database, so new rows are always added at the "end" of the food chain.

For more excellent background info, see Kimberly Tripps' Blog - especially her Clustering Key category!

marc_s