views:

96

answers:

3

I've hit an interesting snag (interesting to me at least). Below is a general idea of what my query looks like. Assume @AuthorType is an input to the stored procedure and that there are various specialized conditions each place I've put comments.

SELECT *
FROM TBooks
WHERE
(--...SOME CONDITIONS)
OR
(@AuthorType = 1 AND --...DIFFERENT CONDITIONS)
OR
(@AuthorType = 2 AND --...STILL MORE CONDITIONS)

What's interesting to me is that if I execute this SP with @AuthorType = 0, it runs slower than if I remove the last two sets of conditions (the ones that add conditions for specialized values of @AuthorType).

Shouldn't SQL Server realize at runtime that those conditions will never be met and ignore them entirely? The difference I'm experiencing is not small; it's approximately doubling the length of the query (1-2 seconds to 3-5 seconds).

Am I expecting SQL Server to optimize this too much for me? Do I really need to have 3 separate SPs for specialized conditions?

+4  A: 

It has to due with how difficult it is for the optimizer to handle "OR" type logic along with issues to do with parameter sniffing. Try changing your query above to a UNION approach like mentioned in the post here. i.e. you'll wind up with multiple statements unioned together with just a single @AuthorType = x AND, allowing the optimizer to rule out portions where AND logic doesn't match the given @AuthorType, and seek into the appropriate indexes in turn ... would look something like this:

SELECT *
FROM TBooks
WHERE
(--...SOME CONDITIONS)
AND @AuthorType = 1 AND --...DIFFERENT CONDITIONS)
union all
SELECT *
FROM TBooks
WHERE
(--...SOME CONDITIONS)
AND @AuthorType = 2 AND --...DIFFERENT CONDITIONS)
union all
...
chadhoc
I appreciate the technique and the linked SO page. Both will get me thinking. Perhaps David B is right that I should fight the urge to reduce duplication...but man, that really doesn't feel right to me.
Jon Smock
+5  A: 

Shouldn't SQL Server realize at runtime that those conditions will never be met and ignore them entirely?

No, absolutely not. There are two factors at play here.

  1. SQL Server does not guarantee boolean operator short circuit. See On SQL Server boolean operator short-circuit for an example showing clearly how query optimization can reverse the order of boolean expression evaluation. While at a first impression this seems like a bug to the imperative C like programming mind set, it is the right thing to do for declarative set oriented world of SQL.

  2. OR is the enemy of SQL SARGability. SQL statements are compliled into an execution plan, then the plan is executed. The plan gets reused between invocations (is cached). As such the SQL compiler has to generate one single plan that fits all separate OR cases (@AuthorType=1 AND @AuthorType=2 AND @AuthorType=3). When it comes to generating the query plan is it exactly as if @AuthorType would have all values at once, in a sense. The result is almost always the worst possible plan, one that cannot benefit any index because the various OR branches contradict each other, so it ends up scanning the whole table and checking rows one by one.

The bestthing to do in your case, and any other case that involves boolean OR, is to move the @AuthorType outside the query:

IF (@AuthorType = 1)
  SELECT ... FROM ... WHERE ...
ELSE IF (@AuthorType = 2)
  SELECT ... FROM ... WHERE ...
ELSE ...

Because each branch is clearly separated into its own statement, SQL can create the proper access path for each individual case.

The next best thing is to use UNION ALL, the way chadhoc already suggested, and is the right approach in views or other places where a single statement is required (no IF is permitted).

Remus Rusanu
A: 

I should fight the urge to reduce duplication...but man, that really doesn't feel right to me.

Would this "feel" better?

SELECT ... lots of columns and complicated stuff ...
FROM 
(
    SELECT MyPK
    FROM TBooks
    WHERE 
    (--...SOME CONDITIONS) 
    AND @AuthorType = 1 AND --...DIFFERENT CONDITIONS) 
    union all 
    SELECT MyPK
    FROM TBooks
    WHERE 
    (--...SOME CONDITIONS) 
    AND @AuthorType = 2 AND --...DIFFERENT CONDITIONS) 
    union all 
    ... 
) AS B1
JOIN TBooks AS B2
    ON B2.MyPK = B1.MyPK
JOIN ... other tables ...

The pseudo table B1 is just the WHERE clause to get the PKs. That is then joined back to the original table (and any others that are required) to get the "presentation". This avoids duplicating the Presentation columns in every UNION ALL

You can take this a step further and insert PKs into a temporary table first, and then join that to the other tables for the presentation aspect.

We do this for very large tables where the user has lots of choices on what to query on.

DECLARE @MyTempTable TABLE
(
    MyPK int NOT NULL,
    PRIMARY KEY
    (
        MyPK
    )
)

IF @LastName IS NOT NULL
BEGIN
   INSERT INTO @MyTempTable
   (
        MyPK
   )
   SELECT MyPK
   FROM MyNamesTable
   WHERE LastName = @LastName -- Lets say we have an efficient index for this
END
ELSE
IF @Country IS NOT NULL
BEGIN
   INSERT INTO @MyTempTable
   (
        MyPK
   )
   SELECT MyPK
   FROM MyNamesTable
   WHERE Country = @Country -- Got an index on this one too
END

... etc

SELECT ... presentation columns
FROM @MyTempTable AS T
    JOIN MyNamesTable AS N
        ON N.MyPK = T.MyPK -- a PK join, V. efficient
    JOIN ... other tables ...
        ON ....
WHERE     (@LastName IS NULL OR Lastname @LastName)
      AND (@Country IS NULL OR Country @Country)

Note that all tests are repeated [technically you don;t need the @Lastname one :) ], including obscure ones which were (lets say) not in the original filters to create @MyTempTable.

The creation of @MyTempTable is designed to make the best of whatever parameters are available. Perhaps if both @LastName AND @Country are available that is far more efficient at filling the table than either one of them, so we create a case for that scenario.

Scaling problems? Review what actual queries are being made and add cases for the ones that can be improved.

Kristen