views:

223

answers:

5

I have three tables:

SmallTable
   (id int, flag1 bit, flag2 bit)
JoinTable
   (SmallTableID int, BigTableID int)
BigTable
   (id int, text1 nvarchar(100), otherstuff...)

SmallTable has, at most, a few dozen records. BigTable has a few million, and is actually a view that UNIONS a table in this database with a table in another database on the same server.

Here's the join logic:

SELECT * FROM
    SmallTable s
    INNER JOIN JoinTable j ON j.SmallTableID = s.ID
    INNER JOIN BigTable b ON b.ID = j.BigTableID
WHERE
    (s.flag1=1 OR b.text1 NOT LIKE 'pattern1%')
    AND (s.flag2=1 OR b.text1 <> 'value1')

Average joined size is a few thousand results. Everything shown is indexed.

For most SmallTable records, flag1 and flag2 are set to 1, so there's really no need to even access the index on BigTable.text1, but SQL Server does anyway, leading to a costly Indexed Scan and Nested Loop.

Is there a better way to hint to SQL Server that, if flag1 and flag2 are both set to 1, it shouldn't even bother looking at text1?

Actually, if I can avoid the join to BigTable completely in these cases (JoinTable is managed, so this wouldn't create an issue), that would make this key query even faster.

A: 

No idea if this will be faster without test data... but it sounds like it might

SELECT * FROM
    SmallTable s
    INNER JOIN JoinTable j ON j.SmallTableID = s.ID
    INNER JOIN BigTable b ON b.ID = j.BigTableID
WHERE
    (s.flag1=1) AND (s.flag2=1)
 UNION ALL
 SELECT * FROM
    SmallTable s
    INNER JOIN JoinTable j ON j.SmallTableID = s.ID
    INNER JOIN BigTable b ON b.ID = j.BigTableID
WHERE
    (s.flag1=0 AND b.text1 NOT LIKE 'pattern1%')
    AND (s.flag2=0 AND b.text1 <> 'value1')

Please let me know what happens

Also, you might be able to speed this up by just returning just a unique id for this query and then using the result of that to get all the rest of the data.

edit

something like this?

SELECT * FROM
    SmallTable s
    INNER JOIN JoinTable j ON j.SmallTableID = s.ID
    INNER JOIN BigTable b ON b.ID = j.BigTableID
WHERE
    (s.flag1=1) AND (s.flag2=1)
 UNION ALL
 SELECT * FROM
    SmallTable s
    INNER JOIN JoinTable j ON j.SmallTableID = s.ID
    INNER JOIN BigTable b ON b.ID = j.BigTableID
WHERE EXISTS
    (SELECT 1 from BigTable b
     WHERE   
    (s.flag1=0 AND b.text1 NOT LIKE 'pattern1%')
    AND (s.flag2=0 AND b.text1 <> 'value1')
)
Hogan
Tried this, it was actually much slower when either flag was set to 1.
richardtallent
I think there might we a way to do this with extra joins and a force order, but my brain is to fuzzy to think of it right now.
Hogan
A: 

It's not elegant, but it should work...

SELECT * FROM
    SmallTable s
    INNER JOIN JoinTable j ON j.SmallTableID = s.ID
    INNER JOIN BigTable b ON b.ID = j.BigTableID
WHERE
    (s.flag1 = 1 and s.flag2 = 1) OR 
    (
       (s.flag1=1 OR b.text1 NOT LIKE 'pattern1%')
       AND (s.flag2=1 OR b.text1 <> 'value1')
    )
Josh Kodroff
Thanks... tried this, but the execution was identical from what I could tell.
richardtallent
+1  A: 

I don't believe SQL Server will short-circuit conditions like that unfortunately.

SO I'd suggest doing 2 queries and UNION them together. First query with s.flag1=1 and s.flag2=1 WHERE conditions, and the second query doing the join on to BigTable with the s.flag1<>1 a s.flag2<>1 conditions.

This article on the matter is worth a read, and includes the bottom line:

...SQL Server does not do short-circuiting like it is done in other programming languages and there's nothing you can do to force it to.

Update:
This article is also an interesting read and contains some good links on this topic, including a technet chat with the development manager for the SQL Server Query Processor team which briefly mentions that the optimizer does allow short-circuit evaluation. The overall impression I get from various articles is "yes, the optimizer can spot the opportunity to short circuit but you shouldn't rely on it and you can't force it". Hence, I think the UNION approach may be your best bet. If it's not coming up with a plan that takes advantage of an opportunity to short cut, that would be down to the cost-based optimizer thinking it's found a reasonable plan that does not do it (this would be down to indexes, statistics etc).

AdaTheDev
Good comment, but while SQL Server doesn't shortcut expression evaluation, it *does* do query optimization. So, if my query is limited to a result set of SmallTable where both flags are `1`, it should be smart enough to skip the check of `BigTable.text1`, but that would require changing the execution plan based on the data. I think that's the core problem.
richardtallent
Yes, I see what you're saying but I still see it as shortcutting - the optimizer would need to be able to see the opportunity for short-circuiting. I've done some tests myself in the past on this kind of thing, and that's what I've seen (no optimisation in this manner).
AdaTheDev
The problem is that for an `or` it can evaluate in any order and it is doing the right hand side first.
Hogan
@Hogan, I think the actual issue is that SQL Server always does *both* sides and merges them, not realizing that the left side may always return a "1" and thus make the right side check superfluous. It's a limitation of having a single, non-procedural execution plan for a query rather than a procedural check that could lop of parts of the query execution path at runtime. Even UNION performs all sides of the UNION, so I don't think it can escape here either. A stored procedure with IF...ELSE...END logic may be the only answer.
richardtallent
Just a word of caution here since this isn't the first time I see the Nigel Ellis chat quoted: the chat transcript was published to TechNet on November 19, 2001. The QO had gone through significant changes in 2005 release, and more changes in 2008 as well. As a general rule, information from Craig's blog is significantly more up to date http://blogs.msdn.com/craigfr/.
Remus Rusanu
+4  A: 

SQL Boolean evaluation does NOT guarantee operator short-circuit. See On SQL Server boolean operator short-circuit for a clear example showing how assuming operator short circuit can lead to correctness issues and run-time errors.

On the other hand the very example in my link shows what does work for SQL Server: providing an access path that SQL can use. So, as with all SQL performance problems and questions, the real problem is not in the way the SQL text is expressed, but in the design of your storage. Ie. what indexes has the query optimizer at its disposal to satisfy your query?

Remus Rusanu
I agree--this isn't a short-cutting issue per se, but a problem where the query engine sends off work to check the value of `text1` even with both flags are set to 1. It's an indexed operation, but unnecessary *if* all selected records in SmallTable have those flags set. As I commented elsewhere, I think it's a problem with the query optimizer not being "smart enough" to avoid the branch of work on BigTable.text1 where all of the records in SmallTable don't require the text comparisons.
richardtallent
Unfortunately there are no procedural/conditional query tree operators. In other words, there is no operators that says 'if condition is true go down this path else this other path'. The query optimizare has to create a plan that satisfies all possible conditions, *even those that have a very low probability*. The query plan can do lots of tricks when ther are *deterministic* conditions, eg. two tables in a join with a trusted foreign relation *may* completely eliminate one table from the plan. This is where all T-SQL tricks come into picture, like use UNION instead of OR.
Remus Rusanu
That's a very good comment. I was struggling to word my understanding but this IMHO is a great explanation.
AdaTheDev
A: 

SQL Server usually grabs the subquery hint (though it's free to discard it):

SELECT      * 
FROM        (
            SELECT * FROM SmallTable where flag1 <> 1 or flag2 <> 1
            ) s
INNER JOIN  JoinTable j ON j.SmallTableID = s.ID
...
Andomar