views:

257

answers:

2

I have a stored procedure which filters based on the result of the DATEADD function - My understanding is that this is similar to using user defined functions in that because SQL server cannot store statistics based on the output of that function it has trouble evaluating the cost of an execution plan.

The query looks a little like this:

SELECT /* Columns */ FROM
TableA JOIN TableB
ON TableA.id = TableB.join_id
WHERE DATEADD(hour, TableB.HoursDifferent, TableA.StartDate) <= @Now

(So its not possible to pre-calculate the outcome of the DATEADD)

What I'm seeing is a terrible terrible execution plan which I believe is due to SQL server incorrectly estimating the number of rows being returned from a part of the tree as being 1, when in fact its ~65,000. I have however seen the same stored procedure execute in a fraction of the time when different (not neccessarily less) data is present in the database.

My question is - in cases like these how does the query optimiser estimate the outcome of the function?

UPDATE: FYI, I'm more interested in understanding why some of the time I get a good execution plan and why the rest of the time I don't - I already have a pretty good idea of how I'm going to fix this in the long term.

+1  A: 

It would help to see the function, but one thing I have seen is burying functions like that in queries can result in poor performance. If you can evaluate some of it beforehand you might be in better shape. For example, instead of

WHERE MyDate < GETDATE()

Try

DECLARE @Today DATETIME
SET @Today = GETDATE()
...
WHERE MyDate < @Today

this seems to perform better

n8wrl
I still dont understand why but this instantly solved it...
Kragen
Because SQL has no way to accuratly estimate the number of rows affected when the columns are within a function. When the columns are not within functions it can use the column statistics to get quite good estimates
GilaMonster
+2  A: 

It's not the costing of the plan that's the problem here. The function on the columns prevent SQL from doing index seeks. You're going to get an index scan or a table scan.

What I'd suggest is to see if you can get one of the columns out of the function, basically see if you can move the function to the other side of the equality. It's not perfect, but it means that at least one column can be used for an index seek.

Something like this (rough idea, not tested) with an index on TableB.HoursDifference, then an index on the join column in TableA

DATEDIFF(hour, @Now, TableA.StartDate) >= TableB.HoursDifferent

On the costing side, I suspect that the optimiser will use the 30% of the table 'thumb-suck' because it can't use statistics to get an accurate estimate and because it's an inequality. Meaning it's going to guess that 30% of the table will be returned by that predicate.

It's really hard to say anything for sure without seeing the execution plans. You mention an estimate of 1 row and an actual of 65000. In some cases, that's not a problem at all. http://sqlinthewild.co.za/index.php/2009/09/22/estimated-rows-actual-rows-and-execution-count/

GilaMonster
It is indeed doing table / index scans, however each table only has a handful of entries (for example 600 or so) - the problem is caused because SQL server ends up doing ~65,000 RDI lookups (on a table which only contains 600 rows!). Again I'm sorry I cant show you an execution plan, but it wouldnt make much sense without knowing the whole context, which like I say involves 10 different tables, a 250 line stored procedure and masses of indexes.
Kragen