ansaurus

Question

Answer 1

+2 A:

You can change clustered index to ([QueryTime], [ResultID]), or change your query from

select top 1 * from results where querytime>'2009-05-01' order by ResultID asc

to

select top 1 <only the columns you actually need> from results where querytime>'2009-05-01' order by ResultID asc

and include all those columns in [IDX_ResultDate2]

AlexKuznetsov 2009-07-09 18:14:44

+1 exactly - shoot for a "covering" index that includes all the fields needed to satisfy the query (if that's possible)

marc_s 2009-07-09 18:30:51

Yep, already doing that (not posted here) but same kind of performance.

Jason Kealey 2009-07-09 22:01:25

Answer 2

+7 A:

I would assume that a binary search in as sorted list of QueryTime followed by peeking at the first result in it's child list of ResultIDs is the fastest way at getting the result. (Hence my initial sort order).

That would be fast indeed, but your query expresses a different request: You are asking for the Result with the minimal ResultId from all the queries that occurred after '2009-05-01'. To satisfy the request it has to seek at the beginning of the range ('2009-05-01'), start a scan from this position to extract all the ResultId, sort them then return the top 1 (the minimum ResultId). The second index you added [idx_ResultDate2] doesn't help much either. The query has to do pretty much exactly the same seek and scan: the ResultIds are sorted whithin a result date, so to find out the top ResultId from all results that are after '2009-05-01' the query still has to scan the index till end.

On your last index, [IDX_ResultDate3], the query is cheating. What it does it starts a scan on the inde and its looking at the QueryTime value, knowing that in this index scan the first Result that has a QueryTime in the desired range (> '2009-05-01') is the one you want (because the ResultId is guaranteed to be the Top 1). You get the result in a 'fraction of a second' out of pure luck: you have a matching Result in the beginning of the index. The query may well scan the entire index and match the very lat Result. You can insert a new Result with a QueryTime like '2010-01-01' and then seek for it, you'll see that the performance degrades as the query has to scan the entire index till end (still faster than a table scan because of the narrower index size).

My question is: are you absolutely positive that your query has to return TOP 1 in ORDER BY ResultID? Or you just picked the order by arbitrarily? If you can change the ORDER BY request to, say, QueryTime, then any of the index (updated: with QueryTime as leftmost column) will return a simple Seek and Fetch, no scansn and no sorting.

Remus Rusanu 2009-07-09 18:17:44

Very good explanation. I understand now. I'll see if I can re-engineer the app to use the QueryTime sort.

Jason Kealey 2009-07-09 22:00:45

Answer 3

+2 A:

You have a ranged filtering condition on one field along with ORDER BY another field.

An index, even a composite index, cannot be used to serve both conditions in this case.

When you create an index on (queryTime, resultId), the index is used for filtering. The engine still needs to order the resultset.

When you create an index on (resultId, queryTime), the index is used for ordering.

Since you need a TOP 1 result and the row that satisifes this result happens to be in the beginning of the index, the latter approach turns out to perform better.

If your filtering condition would be selective (i. e. it would return few rows), and the first result you need would happen to be in the end of the index, the first approarch would be better.

See this article in my blog for some more explanations and hints on which index to create in which conditions:

Choosing index

Quassnoi 2009-07-09 18:18:06

Nice blog post.

Jason Kealey 2009-07-09 21:57:36

Answer 4

A:

The first thing I would suggest is to check if the statistics for this table (all the indexes) are up-to-date.

Since you get two different execution plans with different data sets, it seems that SQL Server is making an infamous "judgement call" in picking one execution plan over another.

I agree with Remus's explanation of why you are getting "magical" results with your last index.

His suggestion is also good - do you really want to order by resultID? Or if you can order by queryTime, then you will have MUCH better performance because the execution plan will be able to use the index order as the result set's order (AND it will seek through the index, vs. scanning).

Jeff Meatball Yang 2009-07-09 18:31:38

Yes, the statistics are up to date. (and yes, it needs to be sorted... unfortunately!)

Jason Kealey 2009-07-09 22:02:18

Answer 5

A:

I'm not sure I can answer the question but would point out that the clustered index key is already included as part of any other index, so its redundant to include ResultID as part of any of the other indexes you propose.

JNappi 2009-07-10 01:22:40

ansaurus

tags:

views:

answers:

SQL Server Index Order (datetime field)

related questions