views:

3707

answers:

3

Since both a Table Scan and a Clustered Index Scan essentially scan all records in the table, why is a Clustered Index Scan supposedly better?

As an example - what's the performance difference between the following when there are many records?:

declare @temp table(
    SomeColumn varchar(50)
)

insert into @temp
select 'SomeVal'

select * from @temp

-----------------------------

declare @temp table(
    RowID int not null identity(1,1) primary key,
    SomeColumn varchar(50)
)

insert into @temp
select 'SomeVal'

select * from @temp
+2  A: 

A table scan has to examine every single row of the table. The clustered index scan only needs to scan the index. It doesn't scan every record in the table. That's the point, really, of indices.

DrPizza
+1  A: 

http://msdn.microsoft.com/en-us/library/aa216840(SQL.80).aspx

The Clustered Index Scan logical and physical operator scans the clustered index specified in the Argument column. When an optional WHERE:() predicate is present, only those rows that satisfy the predicate are returned. If the Argument column contains the ORDERED clause, the query processor has requested that the rows' output be returned in the order in which the clustered index has sorted them. If the ORDERED clause is not present, the storage engine will scan the index in the optimal way (not guaranteeing the output to be sorted).

http://msdn.microsoft.com/en-us/library/aa178416(SQL.80).aspx

The Table Scan logical and physical operator retrieves all rows from the table specified in the Argument column. If a WHERE:() predicate appears in the Argument column, only those rows that satisfy the predicate are returned.

Stu
+12  A: 

In a table without a clustered index (a heap table), data pages are not linked together - so traversing pages requires a lookup into the Index Allocation Map.

A clustered table, however, has it's data pages linked in a doubly linked list - making sequential scans a bit faster. Of course, in exchange, you have the overhead of dealing with keeping the data pages in order on INSERTs, UPDATEs, and DELETEs.

If your query has a RANGE operator (eg., SELECT * FROM TABLE WHERE Id BETWEEN 1 AND 100), then a clustered table (being in a guaranteed order) would be more efficient - as it could use the index pages to find the relevant data page(s). A heap would have to scan all rows, since it cannot rely on ordering.

And, of course, a clustered index lets you do a CLUSTERED INDEX SEEK, which is pretty much optimal for performance...a heap with no indexes would always result in a table scan.

So:

  • For your example query where you select all rows, the only difference is the doubly linked list a clustered index maintains. This should make your clustered table just a tiny bit faster than a heap with a large number of rows.

  • For a query with a WHERE clause that can be (at least partially) satisfied by the clustered index, you'll come out ahead because of the ordering - so you won't have to scan the entire table.

  • For a query that is not satisified by the clustered index, you're pretty much even...again, the only difference being that doubly linked list for sequential scanning. In either case, you're suboptimal.

  • For INSERTs, UPDATEs, and DELETEs a heap should win - as there's no overhead in maintaining order.

I quickly glanced at this Microsoft whitepaper, which has some actual numbers, but I don't think there should be any surprises in there.

Mark Brackett