views:

120

answers:

4

I have the SQL query:

SELECT ISNULL(t.column1, t.column2) as [result]
FROM t

I need to filter out data by [result] column. What is the best approach regarding performance from the two listed below:

WHERE ISNULL(t.column1, t.column2) = @filterValue

or:

WHERE t.column1 = @filterValue OR t.column2 = @filterValue

UPDATE: Sorry, I have forgotten to mention that the column2 is always null if the column1 is filled.

+3  A: 

Measure, don't guess! This is something you should be doing yourself, with production-like data. We don't know the make-up of your data and that makes a big difference.

Having said that, I wouldn't do it either way. I'd create another column, column3 to store column1 if non-NULL and column2 if column1 is NULL.

Then I'd have an insert/update trigger to populate that column correctly, index it and use the screaming-banshee-speed:

select t.column3 as [result] from t

The vast majority of databases are read more often than written and it's better if this calculation is done as few times as possible (i.e., when the data changes, not every time you select it). If you want your databases to be scalable, don't use per-row functions.

It's perfectly valid to sacrifice disk space for speed and the triggers ensure that the data doesn't become inconsistent.

If adding another column and triggers is out of the question, I'd go for the or solution since it can often be split into two parallel queries by the smarter DBMS engines.

An alternative, which MarkB gave but since deleted his answer so I'll have to go hunting for another good answer of his to upvote :-), is to use UNION ALL. If your DBMS isn't quite smart enough to recognise OR as a chance for parallelism, it may be smart enough to recognise UNION ALL in that context, something like:

select column1 as c from t where column1 is not NULL
union all
select column2 as c from t where column1 is NULL

But again, it depends on both your database and your data. A smart DBA would put the whole thing in a stored procedure so they could swap in a new method seamlessly should the data change its properties.

paxdiablo
I what to know the best practice for such filtering. The measurement result can vary dramatically in depends of data entered in table.
klashar
+2 for "measure, don't guess". -1 for suggesting triggers. So that makes +1 overall. :)
Mark Byers
The best practice is to measure, and measure often. DBAs aren't just for setting up databases, they're also supposed to keep them tuned. The trigger solution is a good one, based on the way a lot of DBMS' work under the covers. But, for example, if your table never has more than 6 rows, it's a waste of time. It _always_ depends on the data :-)
paxdiablo
A: 

I think, your replacement expression does not mean the same. Assume filterValue is 2, then ISNULL(1,2)=2 is false, but 1=2 or 2=2 is true. The expression you need looks more like:

 (c1=filter) or ((c1 is null) and (c2 = filter));

There is a chance that a server can answer this from the index on c1. First part of the soultion is an index scan over c1=filter. The second part is a scan over c1=null and then a linear search for c2=filter. I'd even say that a clustered index (c1,c2) could work here.

OTOH, you should rather measure before make assumptions like this, speculations doesn't work usually in SQL unless you have intimate knowledge on the implementation. For example, I'm pretty sure that the query planners already knows that ISNULL(X,Y) can be decomposed into a boolean statement with its implications for searching, but I would not rely on that but rather measure and then decide what to do.

Luther Blissett
+1  A: 

On an MSSQL-Table (MSSQL 2000) with 13.000.000 entries and indexes on Col1 and Col2 i get the following results:

select top 1000000 * from Table1 with(nolock) where isnull(Col1,Col2) > '0'
-- Compile-Time: 4ms
-- CPU-Time: 18265ms
-- Elapsed-Time: 24882ms = ~25s

select top 1000000 * from Table1 with(nolock) where Col1 > '0' or (Col1 is null and Col2 > '0')
-- Compile-Time: 9ms
-- CPU-Time: 7781ms
-- Elapsed-Time: 25734 = ~26s

The measured values are subject to strong fluctuations base on the workload of the server.

The first statment need lesser time to compile but takes more cpu-time for excecution (culstered index scan).

Its important to know that many storage-engines have an optimizer who reorganize the statment for better results und executiontimes. Ultimately, both statements will rebuild to mostly the same statement by the optimizer.

Floyd
A: 

I have measured the performance of both queries on SQL Sever 2008.

And have got the following results:

Both approaches had almost the same Estimated Subtree Cost metric. But the OR approach had more accurate value of the Estimated Number of Rows metric.

So the query optimizer will build more appropriate execution plan for the OR than for ISNULL approach.

klashar