What should I do to get an Clustered Index Seek instead of Clustered Index Scan?

If the Query incldues more than a certain percentage of the rows in the table, the optimizer will elect to do a scan instead of a seek, because it predicts that it will require fewer disk IOs in that case (For a Seek, It needs one Disk IO per level in the index for each row it returns), whereas for a scan there is only one disk IO per row in the entire table.

So if there are, say 5 levels in the b-tree Index, then if the query will generate more than 20% of the rows in the table, it is cheaper to read the whole table than make 5 IOs for each of the 20% rows...

Can you narrow the output of the query a bit more, to reduce the number of rows returned by this step in the process? That would help it choose the seek over the scan.

W/o any detail is hard to guess what the problem is, and even whether is a problem at all. The choice of a scan instead of a seek could be driven by many factors:

The query expresses a result set that covers the entire table. Ie. the query is a simple SELECT * FROM <table>. This is a trivial case that would be perfectly covered by a clustred index scan with no need to consider anything else.
The optimizer has no alternatives:
- the query expresses a subset of the entire table, but the filtering predicate is on columns that are not part of the clustered key and there are no non-clustred indexes on those columns either. These is no alternate plan other than a full scan.
- The query has filtering predicates on columns in the clustred index key, but they are not SARGable. The filtering predicate usually needs to be rewritten to make it SARGable, the proper rewrite depends from case to case. A more subtle problem can appear due to implicit conversion rules, eg. the filtering predicate is WHERE column = @value but column is VARCHAR (Ascii) and @value is NVARCHAR (Unicode).
- The query has SARGale filtering predicates on columns in the clustered key, but the leftmost column is not filtered. Ie. clustred index is on columns (foo, bar) but the WHERE clause is on bar alone.
The optimizer chooses a scan.
- When the alternative is a non-clustered index then scan (or range seek) but the choice is a to use the clustered index the cause can be usually tracked down to the index tipping point due to lack of non-clustered index coverage for the query projection. Note that this is not your question, since you expect a clustered index seek, not a non-clustred index seek (assumming the question is 100% accurate and documented...)
- Cardinality estimates. The query cost estimate is based on the clustered index key(s) statistics which provide an estimate of the cardinality of the result (ie. how many rows will match). On a simple query This cannot happen, as any estimate for a seek or range seek will be lower than the one for a scan, no matter how off the statistics are, but on a complex query, with joins and filters on multiple tables, things are more complex and the plan may include a scan where a seek was expected because the query optimizer may choose plan on which the join evaluation order is reversed to what the observer expects. The reverse order choice may e correct (most times) or may be problematic (usually due to statistics being obsolete or to parameter sniffing).
- An ordering guarantee. A scan will produce results in a guaranteed order and elements higher on the execution tree may benefit from this order (eg. a sort or spool may be eliminated, or a merge join can be used instead of hash/nested joins). Overall the query cost is better as a result of choosing an apparently slower access path.

These are some quick pointers why a clustered index scan may be present when a clustered index seek is expected. The question is extremly generic and is impossible to give an answer 'why', other than relying on an 8 ball. Now if I take your question to be properly documented and correctly articulated, then to expect a clustered index seek it means you are searching an unique record based on a clustred key value. In this case the problem has to be with the SARGability of the WHERE clause.

"On a simple query This cannot happen, as any estimate for a seek or range seek will be lower than the one for a scan" ... not true... If only 10% of rows have `Special = true (1)`, then the simple query `Select * From table where Special = 0` Will always choose a table Scan. This is the case even when an index exists on the Special column, as long as the index is not the clustered index. (If the index is the clustered index, then the optimizer will of course choose a range seek.)

Charles Bretana 2010-05-05 19:31:53

@Charles: but that's the 'index tipping point' at work, which I just mentioned. Btw, your example query is not 'simple' (ie. Trivial, see http://msdn.microsoft.com/en-us/library/aa226174%28SQL.70%29.aspx): "a SELECT statement where all the columns are in a unique covering index and there is no other index that has that set of columns in it."

Remus Rusanu 2010-05-05 20:50:59

@Remus, if that's what you meant by "Simple" then I can agree technically, but in context, w/o that clarification as to your meaning of the word 'simple' a reader would get a very different, and incorrect, impression. If you put the word 'simple' in quotes or highlighted, with yr definition in parens or as a footnote, then fine... Otherwise the reader assumes that any 'simple' query will not have this issue, and that is incorrect, for the reasons you mention. ... to exacerbate the issue, you then semi-define complex as related to join complexity, which is not related to the tipping issue.

Charles Bretana 2010-05-06 01:01:28

I recommend you run SET STATISTICS IO ON; and then invoke the procedure. This will send to the messages panel the number of physical and logical reads and writes. If there are large (100.000s) logical reads they indicate scans that are causing performance problems.

Remus Rusanu 2009-09-01 16:26:55

ansaurus

tags:

views:

answers:

What should I do to get an Clustered Index Seek instead of Clustered Index Scan?

related questions