ansaurus

Question

Estimated Subtree Cost Wildly Off, Terrible Optimization

Answer 1

+1 A:

It would have to parse your function (fnGetCombinedTwoPartKey) to determine what columns are fetched to create the result column. It can't so it's going to assume all columns are necessary. If your indexes are covering indexes then your estimate is going to be wrong.

Jay 2010-07-06 20:29:39

Answer 2

+1 A:

Looks like you're already pretty close to the explanation. It's because of this:

The view consists of several rather large tables, and it's id field is a string concatenation of their respective Ids...

This creates a non-sargable join predicate condition, and prevents SQL server from using any of the indexes on the base tables. Thus, the engine has to perform a full scan of all the underlying tables for each join (two in your case).

Perhaps in order to avoid doing several full table scans (one for each table, multiplied by the number of joins), SQL Server has decided that it will be faster to simply use the cartesian product and filter afterward (hence the "no join predicate" warning). When you FORCE ORDER, it dutifully performs all of the full scans and nested loops that you originally asked it for.

I do agree with some of the comments that this view is underlying a problematic data model, but the short-term workaround, as you've discovered, is to index the computed ID column in the view, which (obviously) makes it sargable again because it has hashes of the actual generated ID.

Edit: I also missed this on the first read-through:

WHERE dbo.fnDatesAreOverlapping(N.dtmValidStartDate,N.dtmValidEndDate,A.dtmValidStartDate,A.dtmValidEndDate) = 1

This, again, is a non-sargable predicate which will lead to poor performance. Wrapping any columns in a UDF will cause this behaviour. Indexing the view also materializes it, which may also factor into the speed of the query; without the index, this predicate has to be evaluated every time and forces a full scan on the base tables, even without the composite ID.

Aaronaught 2010-07-06 20:34:53

I'm curious, though, why sql server doesn't properly take advantage of the index without the noexpand hint. Looks like I should chalk up the bad numbers in the estimate as sql not being able to handle a very bad query on very large datasets. Similarly for the plan itself.

Brian 2010-07-07 15:52:51

@Brian: In the absence of evidence to the contrary, I would probably blame out-of-date statistics. If you have to use `NOEXPAND` then it means that the optimizer thinks it will be cheaper to query the base tables instead of using the index on the view; the only reasons I can think of for that are either (a) more non-sargable predicates that aren't shown in your final query examples, or (b) the optimizer thinks that the base table queries will be way cheaper than they really are (which is usually due to bad statistics). If you're sure that the predicates are fine, try `sp_updatestats`.

Aaronaught 2010-07-07 16:14:49

Oh - it could also be the result of a non-covering index. If the materialized view doesn't actually have all of the necessary output columns, then it effectively has to join the view to every single base table, which it might deem to be very expensive. Make sure you're using `INCLUDE` properly on your index.

Aaronaught 2010-07-07 16:16:05

I think I have a pretty good idea on the general opinion of keeping the UDF and relying on the hint vs replacing with the underlying logic (which gets very verbose as the number of date ranges that must be overlapping increases)...

Brian 2010-07-07 17:12:31

ansaurus

tags:

views:

answers:

Estimated Subtree Cost Wildly Off, Terrible Optimization

related questions