views:

665

answers:

8

I have a fairly simple query:

SELECT
     col1,
     col2…
FROM
     dbo.My_Table
WHERE
     col1 = @col1 AND
     col2 = @col2 AND
     col3 <= @col3

It was performing horribly, so I added an index on col1, col2, col3 (int, bit, and datetime). When I checked the query plan it was ignoring my index. I tried reordering the columns in the index in every possible configuration and it always ignored the index. When I run the query it does a clustered index scan (table size is between 700K and 800K rows) and takes 10-12 seconds. When I force it to use my index it returns instantly. I was careful to clear the cache and buffers between tests.

Other things I’ve tried:

UPDATE STATISTICS dbo.My_Table

CREATE STATISTICS tmp_stats ON dbo.My_Table (col1, col2, col3) WITH FULLSCAN

Am I missing anything here? I hate to put an index hint in a stored procedure, but SQL Server just can’t seem to get a clue on this one. Anyone know any other things that might prevent SQL Server from recognizing that using the index is a good idea?

EDIT: One of the columns being returned is a TEXT column, so using a covering index or an INCLUDE won't work :(

+1  A: 

The order of the index is important for this query:

CREATE INDEX MyIndex ON MyTable (col3 DESC, col2 ASC, col1 ASC)

It's not so much the ASC/DESC as that when sql server goes to match that where clause, it can match on col3 first and walk the index along that value.

Joel Coehoorn
For this query, the order of the index should be just as @Tom H. created it.
Quassnoi
I did try several possible orders for the columns. All gave the same result.
Tom H.
+1  A: 

SQL Server optimizer is not good in optimizing queries that use variables.

If you are sure that you will always benefit from using the index, just put a hint.

If you will put the literal values to the query instead of variables, it will pick the correct statistics and will use the index.

You may also try to put a more light hint:

OPTION (OPTIMIZE FOR (@col1 = 1, @col2 = 0, @col3 = '2009-07-09'))

, which will calculate the best execution plan for these values of the variables, using statistics, and won't stick to using index no matter what.

Quassnoi
If I run the query outside of the SP, with the column values hard-coded it still uses a clustered index scan :(
Tom H.
@Tom: could you please post your exact table definition?
Quassnoi
+1  A: 

Have you tried tossing out the bit from the index?

create index ix1 on My_Table(Col3, Col1) INCLUDE(Col2) 
-- include other columns from the select list if needed

Also, you've left out the rest of the columns from the select list. You might want to consider including those if there aren't many either in the index or as INCLUDE statement to create a covering index for the query.

Scott Ivey
+1  A: 

Try masking your parameters to prevent paramter sniffing:

CREATE PROCEDURE MyProc AS
 @Col1 INT
 -- etc...
AS
 DECLARE @MaskedCol1 INT
 SET @MaskedCol1 = @Col1
 -- etc...

 SELECT
   col1,
   col2…
 FROM
   dbo.My_Table
 WHERE
   col1 = @MaskecCol1 AND
   -- etc...

Sounds stupid but I've seen SQL server do some weird things because of parameter sniffing.

Kragen
Thanks for the suggestion. I can run the SELECT outside of the SP though and I still see the same situation.
Tom H.
+1  A: 

I bet SQL Server thinks the price of getting the rest of the columns (designated by ... in your example) from the clustered index outweighs the benefit of the index so it just scans the clustered key. If so, see if you can make this a covering index.

Or does it use another index instead?

zvolkov
It's using the clustered key if I don't force the index use. The column list includes all of the columns in the table. While I could put a big covering index on that, I'd be effectively duplicating the table. I'll need to look into the INSERT/UPDATE/DELETE frequencies to see if the cost is warranted.
Tom H.
A: 

Are the columns nullable? Sometimes Sql Server thinks it has to scan the table to find NULL values.

Try adding "and col1 is not null" to the query, it mgiht make sqlserver use the index wtihout hint.

Also, check if the statistics are really up to date:

SELECT 
    object_name = Object_Name(ind.object_id),
    IndexName = ind.name,
    StatisticsDate = STATS_DATE(ind.object_id, ind.index_id)
FROM SYS.INDEXES ind
order by STATS_DATE(ind.object_id, ind.index_id) desc
Andomar
Unlike Oracle, SQL Server indexes NULLs just as well and an index always covers all rows.
Quassnoi
My theory was that sometimes the statistics show there are many rows with NULLs. Then Sql Server reverts to a table scan to cover the case "c1 is null".
Andomar
@Andomar: this query can never cover rows for which `COL1` is `NULL`
Quassnoi
+6  A: 

You have 800k rows indexed by col1, col2, col3. Col2 is a bit, so its selectivity is 50%. Col3 is a checked on a range (<=), so it's selectivity will be roughly at about 50% too. Which leaves col1. The query is compiled for the generic, parametrized plan, so it has to account for the general case. If you have 10 distinct values of col1, then your index will return approximately 800k /10 * 25% that is about ~20k keys to lookup in the clustered index to retrieve the '...' part. If you have 10k distinct col1 values then the index will return just 20 keys to look up. As you can see, what matters is not how you build your index in this case, but the actual data. Based on the selectivity of col1, the optimizer will choose a plan based on a clustered index scan (as better than 20k key lookups, each lookup at a cost of at least 3-5 page reads) or one based on the non-clustered index (if col1 is selective enough). In real life the distribution of col1 also plays a role, but going into that would complicate the explanation too much.

You can come with the benefit of hindsight and claim the plan is wrong, but the plan is the best cost estimate based on the data available at compile time. You can influence it with hints (index hint as you suggests, or optimize for hints as Quassnoi suggests) but then your query may perform better for your test set, and far worse for a different set of data, say for the case when @col1 = <the value that matches 500k records>. You can also make the index covering, thus eliminating the '...' in the projection list that require the clustered index lookup necessary, in which case the non-clustered index is always a better cost match than the clustered scan.

Kimberley Tripp has a blog article covering this subject, she calls it the 'index tipping point' which explains how come an apparently perfect candidate index is being ignored: a non-clustered index that does not cover the projection list and has poor selectivity will be seen as more costly than a clustered scan.

Remus Rusanu
Thanks for the information and suggestions. It gives me some ideas to look into.
Tom H.
A: 

If your SELECT is returning columns that aren't in your index SQL my find that its more efficient to scan the clustered index instead of having to do a key lookup to find the other values that you are requesting.

If you have a TEXT column try switching the data type to VARCHAR(MAX) then including the values in the nonclustered index.

mrdenny