views:

446

answers:

7

Do T-SQL queries in SQL Server support short-circuiting?

For instance, I have a situation where I have two database and I'm comparing data between the two tables to match and copy some info across. In one table, the "ID" field will always have leading zeros (such as "000000001234"), and in the other table, the ID field may or may not have leading zeros (might be "000000001234" or "1234").

So my query to match the two is something like: select * from table1 where table1.ID LIKE '%1234'

To speed things up, I'm thinking of adding an OR before the like that just says: table1.ID = table2.ID to handle the case where both ID's have the padded zeros and are equal.

Will doing so speed up the query by matching items on the "=" and not evaluating the LIKE for every single row (will it short circuit and skip the LIKE)?

+1  A: 

fix the database to be consistent

select * from table1 where table1.ID LIKE '%1234'

will match '1234', '01234', '00000000001234', but also '999991234'. Using LIKE pretty much guarantees an index scan (assuming table1.ID is indexed!). Cleaning up the data will improve performance significantly.

if cleaning up the data is not possible, write a user-defined function (UDF) to strip off leading zeros, e.g.

select * from table1 where dbo.udfStripLeadingZeros(table1.ID) = '1234'

this may not improve performance (since the function will have to run for each row) but it will eliminate false matches and make the intent of the query more obvious

EDIT: Tom H's suggestion to CAST to an integer would be best, if that is possible.

Steven A. Lowe
like does not guarentee table scan like '%1234' is a table scan. like '1234%' is not a table scan, assuming the column is indexed it will use the index to find 1234...
JoshBerke
@Josh: i said "index scan", not "table scan".
Steven A. Lowe
@Steven yep I see no idea why but when I read it first time I could have sworn it said table...I'm getting old I think;-)
JoshBerke
But either way I think prefixing with a wild card character would result in a table scan maybee that's why I got them confused.
JoshBerke
+1  A: 

If the ID is purely numeric (as your example), I would reccomend (if possible) changing that field to a number type instead. If the database is allready in use it might be hard to change the type though.

Stein G. Strindhaug
Yeah, I can't change the source table.
Sam Schutte
+2  A: 

You want to make sure that at least one of the tables is using its actual data type for the IDs and that it can use an index seek if possible. It depends on the selectivity of your query and the rate of matches though to determine which one should be converted to the other. If you know that you have to scan through the entire first table, then you can't use a seek anyway and you should convert that ID to the data type of the other table.

To make sure that you can use indexes, also avoid LIKE. As an example, it's much better to have:

WHERE
     T1.ID = CAST(T2.ID AS VARCHAR) OR
     T1.ID = RIGHT('0000000000' + CAST(T2.ID AS VARCHAR), 10)

than:

WHERE
     T1.ID LIKE '%' + CAST(T2.ID AS VARCHAR)

As Steven A. Lowe mentioned, the second query might be inaccurate as well.

If you are going to be using all of the rows from T1 though (in other words a LEFT OUTER JOIN to T2) then you might be better off with:

WHERE
     CAST(T1.ID AS INT) = T2.ID

Do some query plans with each method if you're not sure and see what works best.

The absolute best route to go though is as others have suggested and change the data type of the tables to match if that's at all possible. Even if you can't do it before this project is due, put it on your "to do" list for the near future.

Tom H.
+5  A: 

You could add a computed column to the table. Then, index the computed column and use that column in the join.

Ex:

Alter Table Table1 Add PaddedId As Right('000000000000' + Id, 12)
Create Index idx_WhateverIndexNameYouWant On Table1(PaddedId)

Then your query would be...

select * from table1 where table1.PaddedID ='000000001234'

This will use the index you just created to quickly return the row.

G Mastros
Good. No ORs, not UDFs in line, indexable etc etc
gbn
Also a good suggestion.
Sam Schutte
Just keep in mind that this will cause a (hopefully minor) hit for inserts and updates. Good idea though as long as that's not a problem.
Tom H.
if you can't fix the data to be consistent, this is the next best solution
Steven A. Lowe
+3  A: 

How about,

table1WithZero.ID = REPLICATE('0', 12-len(table2.ID))+table2.ID

In this case, it should able to use the index on the table1

Dennis Cheung
+4  A: 

SQL Server does NOT short circuit where conditions. it can't since it's a cost based system: How SQL Server short-circuits WHERE condition evaluation .

Mladen Prajdic
+1  A: 

Just in case it's useful, as the linked page in Mladen Prajdic's anwer explains, CASE clauses are short-circuit evaluated.

Daniel Renshaw