views:

31

answers:

1

If I have two table of data. One has a clustered index CINDEX, the other is a heap HEAP.

Both also have a non-clustered index on the same column - SEARCHCOL

Assume my clustered index columns are the same size as a rowid and therefore the depth of both of the non-clustered index is the same.

Which would take fewer I/O's to fetch a table row...

a) SELECT * FROM CINDEX WHERE SEARCHCOL = :1

b) SELECT * FROM HEAP WHERE SEARCHCOL = :1

Choose a or b Explain why state any assumptions

+1  A: 

If the Searchcol has enough selectivity then the plan should do the expected seek into non-clustered index (identical between the two) and then lookup the clustered index or the heap in order to get all the columns as to satisfy the * projection. This later lookup will be faster in a heap (direct search by page:slot) compared to a BTree seek (has to hit 1-2 non leaf pages to land on the leaf page containing the row) if the heap row was not moved. If the heap row was moved, then the lookup has to chase the forwarding pointer on a new page, meaning a new logical read IO, and so on until it find the place (if it mover multiple times). So in general a heap would save 1-2 logical read IOs (the non-leaf part of the seek in the lookup).

If the SEARCHCOL is not selective enough and the query hits the Tipping Point then all bets are off as one plan would to a clustered index scan in key order, while the other would to a heap scan in allocation order (they would end up roughly the same IO).

But I have to warn that this kind of minutia measurement (1-2 page IOs) are not healthy when making a decision about heap vs. BTree. My take is always choose BTree unless there are explicit reasons not to. Explicit reasons would usually be INSERT performance (this is where HEAPs run circles around BTrees) and that implies ETL data load scenarios where the data is loaded into a heap for fast upload performance and then the heap is turned into a clustered index and added with a switch operaiton into the big fact table.

Remus Rusanu
Maybe that's a minutia in the average SQL Server database... but I make a living tuning one or two IO's out of high performance databases. I think there should be a radiobutton on questions... (x) I need an enterprise big-boy answer or ( ) I have a departmental app with a handful of users and no real SLA. But again I thank you for confirming my suspicion. And the wider the row is, the fewer rows per leaf block, the deeper the b-tree table is vs heap the more "unobtrusive" IO's appear and the more Billions of rows, the deeper again... so Wide or Long tables get worse and worse comparatively.
Stephanie Page
Just to be clear it's not the one or two IO's on a single query... it's when that query gets executed 10,000 per minute.
Stephanie Page
Fair enough. Comparing Tables Organized with Clustered Indexes versus Heaps: http://msdn.microsoft.com/en-us/library/cc917672.aspx
Remus Rusanu
Remus, I've read a lot of articles on the net and soooo many are filled with gratuitous assertions. Some articles say, CI's make everything go faster... it's the infamous FASTER = TRUE config setting that is always correct. So I'm trying to tease out fact from myth. I was going to post a long multiple part question but I thought that wouldn't work on SO. So, I'm posting sequential questions, one building on the previous.
Stephanie Page
There's a strong community here at SO, I hope you'll get your answers. Sometimes though things are not black or white and more often than not the answer will be 'it depends'...
Remus Rusanu