views:

160

answers:

2

I have quite a few stored procedures following the pattern of selecting the row for which a date column is the latest up to a certain date, inclusive. I see two forms in use:

select top 1 x, y, z from sometable where a=b and date <= @date order by date desc

or

select x, y, z from sometable where a=b and date=(select max(date) from sometable where a=b and date <= @date)

I can imagine a derivation of the second form that uses a join instead of a subquery too.

We can ignore the case where the second form may return multiple rows. Assume it never will.

Since this is used in a lot of places, some of which against large numbers of rows in performance critical code, I want to standardise on whichever is the more optimal solution (which may be some other suggestion).

Some googling has turned up numerous comparisons of TOP 1 vs MAX, but generally for a single value, and no subquery. In that case MAX is the clear winner, but I'm not sure if the subquery changes that.

I'd appreciate the views of those more knowledgable than I in this area (which should be most of you!).

+1  A: 

The query optimizer has a lot of freedom, and it can execute both MAX or TOP 1 in various ways. Exactly what it does depends on the source query, the available indexes, and the statistics for your table, among other things. Tomorrow it might choose a different appreach, as the size of your table or its distribution change.

So I don't think there's one optimal solution. Wait for actual performance issues and optimize them one by one.

Andomar
Thanks for your response. I appreciate the premature optimisation concern. Part of what I m doing is tuning the performance. The fact that actual performance may vary is why I'm asking here rather than just measuring - i.e. I'm interested in whether there is anything fundamentally different between the approaches that will thwart the query optimiser - or if there is a better approach. Your comments suggest that, in this instance, they should probably be seen as equivalent...?
Phil Nash
I'm racking my brains trying to figure out what the various ways might be: If date is not indexed, it will scan, if date is indexed but doesn't cover, it will seek with a lookup, if date is indexed and it covers, it will simply seek - What other approaches might it take?
Mike DeFehr
For example it would scan if it thinks there are many rows with the same date, even though there is an index.
Andomar
The only component that *might* scan in this scenario is the outer query of the MAX - but this is only if the unique data assumption does not hold or there is something wrong with your statistics (either way you have other problems) - the subquery can seek (if it has an index) regardless of the number of matching rows, and the TOP 1 will always *technically* scan (even with a unique index) but it will always stop after it reads 1 - so, not really, no...
Mike DeFehr
If Sql Server thinks it will match many rows, it tries to avoid the lookup associated with an index seek. That is often the right approach: a table scan for 100 hits is often faster than 100 index seeks + 100 table lookups. But sometimes it's wrong, like one case I saw, where someone used an index hint to reduce execution time from 2 hours to 2 seconds. Never say never :)
Andomar
That is absolutely correct, however, it has very little to do with this question because with the TOP 1 technique,SQL server knows it's only getting one row, so it goes to the end of the index, grabs the first row it finds, does its lookup and calls it a day regardless of the uniqueness of date - the possibility of a scan is only introduced with the MAX technique, thus making it suboptimal. You are zeroing in on the notion that *if* the statictics are way out of date *and* the OP's assumption is invalid then there could be alternate plans - doesn't answer the question
Mike DeFehr
My answer was a general "don't optimize prematurely". I apologize for zeroing in in response to your comments.
Andomar
+2  A: 

Your results may vary depending on table design, but generally speaking, the TOP 1 / Order by technique is 2 times better when there is no index on date because SQL server has to do a scan for each query - first to find the max date, then to look up the rest of the values based on it. When there is an index on date (whether it covers the query or not) the plan is the same.

The most important thing to consider here is indexing. If this query is to be executed a lot, you'll want to make sure you index the date field.

Both from the standpoint of the TOP 1 being optimal in certain circumstances and the issue you already touched on: the MAX could return more than 1 row (don't assume it won't someday, by the way, unless there is a unique index on date), I definitely prefer the TOP 1 technique - it is the technique I use for all such queries.

Mike DeFehr
horribly out of date statistics could theoretically cause the outer query in the MAX technique to scan if the optimizer somehow gets fooled into thinking it is not selective enough for a seek, but I doubt it - either way, another reason to use TOP 1 - it wouldn't fall into that trap
Mike DeFehr
So what you're saying is that TOP 1 gives all the advantages, but no disadvantages - wheras MAX might give some disadvantages in some circumstances - but with other things as they should be there's not much in it? That makes a lot of sense. Thanks for the comments
Phil Nash
Correct - that's what I'm saying - however - far more important than how you write the query is what indexes you have - this is definitely true here and is very often true elsewhere. Another thing I'll stress primarily for the benefit of others, is that the two queries are not logically equivalent - they will only yield the same results if there is a unique index on date or the maximum date happens to be a unique value
Mike DeFehr
In my case I'm not currently in a position to influence the indexing, but I believe in most of the instances I'm looking at there is an index on date.
Phil Nash
Alright, then you're doing what you can. Anything else to answer your question?
Mike DeFehr