ansaurus

Question

In SQL Server, what performs better: where ProcessedDate is null or where Processed = 0

Answer 1

+2 A:

You should not have both.

Pick the one that matches your domain model best.

With the right index, any performance difference will negligible (assuming that the relative distribution of 'processed' to 'not processed' means the optimizer will pick an index in the first place).

Mitch Wheat 2010-02-23 00:32:38

Answer 2

+2 A:

As with most SQL performance questions, the answer will be driven by your table schema, not by your SQL text. As it is right now, both version require a full table scan, so they are equaly bad.

A non clustered index on ProcessedDate may help the first version, but since you ask for * in the projection, the index tipping point may kick in and still hit a full scan.

On the other hand a clustered index with the leftmost key Processed will always work, and such a clustered index structure is common in queue processing (and having a 'ProcessedDate' column is a clear give away that your table is used as a queue).

But then a leftmost column on ProcessedDate for the clustered index would also work just like the Processed bit.

The lesson to take home: is not the query, is the schema. Always.

Remus Rusanu 2010-02-23 00:36:53

@Remus Rusanu: +1. very nice answer. Was just checking your web site. Some very useful info...

Mitch Wheat 2010-02-23 01:18:33

I'd say it is a good answer, expect it is a good answer to a different question. I've asked a specific question about the performance of two clauses, and received an entry level overview of indexing. I asked the specific question because I haven't seen anything about it elsewhere, and next time someone else is wondering about it, all they'll see is this answer, which doesn't help.How about a followup? Do either of you know the answer to the question I asked?

David 2010-02-23 03:40:18

Answer 3

+3 A:

They are identical as far as I can tell with equal sets of data. You can run my example below and look at the execution plans to check. You'll see that the filter element where it tests the predicate takes up the same proportion of time in each. Even if indexes are added it wouldn't make one more efficient than the other as they are basically doing the same thing.

WITH Test (MyInt, MyNull)
AS
(
SELECT 1 AS MyInt, Null AS MyNull

UNION ALL

SELECT 
    MyInt + 1,
    CASE
        WHEN MyInt % 2 = 0 THEN Null
        ELSE MyInt
    END
FROM Test
WHERE MyInt < 10000
)

SELECT * FROM Test
WHERE MyNull IS NULL
OPTION (MAXRECURSION 32767);

WITH Test (MyInt, MyBit)
AS
(
SELECT 1 AS MyInt, 0 AS MyBit

UNION ALL

SELECT 
    MyInt + 1,
    CASE
        WHEN MyInt % 2 = 0 THEN 0
        ELSE 1
    END
FROM Test
WHERE MyInt < 10000
)

SELECT * FROM Test
WHERE MyBit = 0
OPTION (MAXRECURSION 32767);

I know my example is contrived but it uses equal numbers of nulls and 0's. Even if you re-write the CTE for each example into table variables you should see the same performance.

Joe Swan 2010-02-23 11:34:16

Answer 4

+1 A:

One advantage of the ProcessedDate is null predicate is that you don't risk getting bitten by (explicit or implicit) parametrization.

If you use where Processed = 0, SQL server might internally parametrize this to Processed = @p, and the generate the query plan based on the assumption that @p = 0. However, if you somewhere do the same query but where Processed = 1, you might get a plan for @p = 1, and since the data is likely to be skewed, it's likely that the plans won't be the same.

erikkallen 2010-02-23 11:56:34

So does this mean that there is potentially extra overhead in the second execution plan generation for the bit version? Does the null-checking version generate one execution plan that is larger? I'm not sure I see what would happen if the query were run with ProcessedDate is not null.

David 2010-02-23 23:01:18

IS NULL / IS NOT NULL cannot be parameterized. But don't care whether the plan is "larger" (whatever that means) or whether it needs to be regenerated. Of all things that take time in a database call, query plan generation is not one of them.

erikkallen 2010-02-24 11:03:21

ansaurus

tags:

views:

answers:

In SQL Server, what performs better: where ProcessedDate is null or where Processed = 0

related questions