views:

232

answers:

3

In this blog post, I need clarification why SQL server would choose a particular type of scan:

Let’s assume for simplicities sake that col1 is unique and is ever increasing in value, col2 has 1000 distinct values and there are 10,000,000 rows in the table, and that the clustered index consists of col1, and a nonclustered index exists on col2.

Imagine the query execution plan created for the following initially passed parameters: @P1= 1 @P2=99

These values would result in an optimal queryplan for the following statement using the substituted parameters:

Select * from t where col1 > 1 or col2

99 order by col1;

Now, imagine the query execution plan if the initial parameter values were: @P1 = 6,000,000 and @P2 = 550.

As before, an optimal queryplan would be created after substituting the passed parameters:

Select * from t where col1 > 6000000 or col2 > 550 order by col1;

These two identical parameterized SQL Statements would potentially create and cache very different execution plans due to the difference of the initially passed parameter values. However, since SQL Server only caches one execution plan per query, chances are very high that in the first case the query execution plan will utilize a clustered index scan because of the ‘col1 > 1’ parameter substitution. Whereas, in the second case a query execution plan using index seek would most likely be created.

from: http://blogs.msdn.com/sqlprogrammability/archive/2008/11/26/optimize-for-unknown-a-little-known-sql-server-2008-feature.aspx

Why would the first query use a clustered index, and a index seek in the second query?

+1  A: 

Assuming that the columns contain only positive integers:

SQL Server would look at the statistics for the table and see that, for the first query, all rows in the table meet the criteria of col1>1, so it chooses to scan the clustered index.

For the second query, a relatively small proportion of rows would meet the criteria of col1> 6000000, so using an index seek would improve performance.

Ian Nelson
+1  A: 

In cases where the optimizer sees that the majority of the table will be returned in the query, such as the first query, then it's more efficient to perform a scan then a seek.

Where only a small portion of the table will be returned, such as in the second query, then an index seek is more efficient.

A scan will touch every row in the table whether it qualifies or not. The cost is proportional to the total number of rows in the table. A scan is an efficient strategy if the table is small or if most of the rows qualify for the predicate.

A seek will touch rows that qualify and pages that contain these qualifying rows, the cost is proportional to the number of qualifying rows and pages rather than to the total number of rows in the table.

Nick Kavadias
+1  A: 

Notice that in both cases the clustered index will be used. In the first example it is a clustered index SCAN where as in the second example it will be a clustered index SEEK which in most cases will be the faster as the author of the blog states.

SQL Server knows that the clustered index is increasing. Therefore it will do a clustered index scan in the first case.

Jakob Christensen