views:

98

answers:

1

Adding a seemingly perfectly index is having an unexpectedly adverse affect on a query performance...

-- [Data] has a predictable structure and a simple clustered index of the primary key:
ALTER TABLE [dbo].[Data] ADD PRIMARY KEY CLUSTERED ( [ID] )

-- Joins on itself looking for a certain kind of "overlapping" records
SELECT DISTINCT
    [Data].ID AS [ID]
FROM 
    dbo.[Data] AS [Data]
JOIN
    dbo.[Data] AS [Compared] ON
    [Data].[A] = [Compared].[A] AND
    [Data].[B] = [Compared].[B] AND
    [Data].[C] = [Compared].[C] AND
    ([Data].[D] = [Compared].[D] OR [Data].[E] = [Compared].[E]) AND
    [Data].[F] <> [Compared].[F]
WHERE 1=1
    AND [Data].[A] = @A
    AND @CS <= [Data].[C] AND [Data].[C] < @CE -- Between a range

[Data] has about a quarter-million records so far, 10% to 50% of the data satisfies the where clause depending on @A, @CS, and @CE. As is, the query takes 1 second to return about 300 rows when querying 10%, and 30 seconds to return 3000 rows when querying 50% of the data.

Curiously, the estimated/actual execution plan indicates two parallel Clustered Index Scans, but the clustered index is only of the ID, which isn't part of the conditions of the query, only the output. ??

If I add this hand-crafted [IDX_A_B_C_D_E_F] index which I fully expected to improve performance, the query slows down by a factor of 8 (8 seconds for 10% & 4 minutes for 50%). The estimated/actual execution plans show an Index Seek, which seems like the right thing to be doing, but why so slow??

CREATE UNIQUE INDEX [IDX_A_B_C_D_E_F] 
    ON [dbo].[Data] ([A], [B], [C], [D], [E], [F])
    INCLUDE ([ID], [X], [Y], [Z]);

The Data Engine Tuning wizard suggests a similar index with no noticeable difference in performance from this one. Moving AND [Data].[F] <> [Compared].[F] from the join condition to the where clause makes no difference in performance.

I need these and other indexes for other queries. I'm sure I could hint that the query should refer to the Clustered Index, since that's currently winning - but we all know it is not as optimized as it could be, and without a proper index, I can expect the performance will get much worse with additional data.

What gives?

== Edit ==

For Gail, here are the execution plans. Of course, the one that references the index is the one queried with the index available. This is a little different than my original description of the clustered index scan - I deleted the auto-gen PK index for testing and can't get it back(?), so this is without ANY indexes at all, hence the table scan. Different look to the query plan, but no noticeable change in performance. (Table Scan is the fast one)

execution plans

Indexed.sqlplan

Nonindexed.sqlplan

+2  A: 

It's doing the CI scan because the CI is the actual data. An index is just a placeholder to the actual data.

An index seek is definitely the incorrect thing to do on a 50% return query, and it's rare to see one used even on a 10% return rate. Usually if it's over a couple of percent, it's going to scan (that's why on smaller tables you can count on a scan to happened nearly every time).

I would suggest making sure the stats are up to date for that table, and possibly make sure that the index isn't in need of maintenance itself.

UPDATE STATS - http://msdn.microsoft.com/en-us/library/ms187348.aspx

Mike M.
Oh, I thought it would say "Table Scan". Makes sense. Thanks.I don't understand the rest though - won't the index seek be able to get to the A and B and C and (D or E) neighborhood much quicker than doing an unsorted table scan? Even ABC neighborhood would reduce the self-joined comparisons to only hundreds per ABC group. Basically, I can't see how trudging though the table couldn't be improved by the pre-sorted index.This is all fresh data, fresh indexes - the statistics should already be updated, as the article you referenced suggests they would be.
uosɐſ
As much as I want to understand the problem, I also want to achieve the practical result of getting the same output faster. Any ideas on how to do this quicker?
uosɐſ
I'm not entirely sure how everything works internally to say why the seek is so much more expensive than the scan. If anyone can post that I'd be very interested in reading, as well.Gail Shaw, a very well respected SQL Server MVP, has a post on this as well. She found that (even with a covering index), that it switched from a seek to a scan at approximately 0.4(!!!)% return.http://sqlinthewild.co.za/index.php/2009/01/09/seek-or-scan/
Mike M.
Yeah, and in this case it isn't switching - it's stupidly using the slow way, even advising to make the index if I don't already have one.
uosɐſ
Mike, that blog post you cite is only for the case of a non-covering index. The reason SQL switches to a table/clustered index scan is because of the cost of key lookups, and there are no key lookups when the index is covering.In the case of a covering nonclustered index, SQL happily (and correctly) seeks right up to 100% of the rows affected, because all the info it needs is in the nonclustered index and typically nonclustered indexes are smaller (and hence faster) than clustered indexes.
GilaMonster
If you want to get morr information about seek vs scan, read Kimberley Tripps articles about Tipping Points - she makes it very clear what and when to expect the plan to tip from NC Seek to CL Scan for non-covering index situations.
Andrew
@GilaMonster - that's what's confusing me - this index appears to me to cover the query nicely. Have you solved this before?
uosɐſ
Oh, you're Gail! Hi!
uosɐſ
Yes, it is a covering index, if it wasn't it likely wouldn't even be used with the rows affected.
GilaMonster
@Gila/uo - I apologize for the shoddy advice. I mis-understood the article and worse yet passed off the mis-understood information. Sorry for giving out false advice and associating it to you Gail. Very embarrassing. Ugh.
Mike M.
Dude, it's ok, we're working it through - besides, you got Gail on the case.
uosɐſ
Well, Mike. You lead me to Gail, but then she's gone, so you get the points!
uosɐſ