views:

102

answers:

5

I have a blacklist of people that should never be contacted. When I want to see if a person is in this list, I do the following:

-- Query 1
SELECT * 
FROM bldb.dbo.blacklist l
WHERE l.matchcode
    = dbo.fn_matchcode('12345','Sesame Street','1','Eddie','Bert')

The query runs very fast, since there is an index on matchcode column, and fn_matchcode is deterministic.

Think of matchcode as a compressed form of address and name, which helps me not to be affected from typos in street names etc. It consists of 22 chars: 13 for the address, 9 for the name. When I want to see if any person in 1 Sesame Street, 12345 is in blacklist, I do the following:

-- Query 2
SELECT * 
FROM bldb.dbo.blacklist l
WHERE LEFT(l.matchcode,13)
    = LEFT(dbo.fn_matchcode('12345','Sesame Street','1','Eddie','Bert'),13)

This runs extremely long...

On the contrary, this runs much faster:

-- Query 3
SELECT * 
FROM bldb.dbo.blacklist l
WHERE LEFT(l.matchcode,13)
    = (SELECT LEFT(dbo.fn_matchcode('12345','Sesame Street','1','Eddie','Bert'),13))

That means, the right hand side of the where condition gets calculated for every row! But why? The UDF is deterministic. Is it LEFT(), which is not deterministic?

EDIT:

The answers so far claimed that it is because the index does not get used. However, it is still not clear to me why the following happens.

When I write the query like this:

-- Query 4
SELECT * 
FROM bldb.dbo.blacklist
WHERE matchcode LIKE LEFT(dbo.fn_matchcode('12345','Sesame Street','1','Eddie','Bert'),13) + '%'

it still takes several minutes to finish. Please note that the fn_matchcode just does some string manipulation and returns instantly.

When I hardcode the result of fn_matchcode into the query:

-- Query 5
SELECT * 
FROM bldb.dbo.blacklist
WHERE matchcode LIKE '12345SSMSTRT1%'

it takes a couple of milliseconds! How would you explain that?

+1  A: 

I'd use

SELECT * 
FROM bldb.dbo.blacklist l
WHERE l.matchcode LIKE 
  LEFT(dbo.fn_matchcode('12345','Sesame Street','1','Eddie','Bert'),13) + '%'

To allow the index on matchcode to be used. That doesn't answer your question though but is too long for a comment.

Martin Smith
Good idea, but it didn't help... I thought the engine would be clever enough to use the index when I say LEFT but not RIGHT...
ercan
No - It doesn't do that for `left`. I'm not sure why not because it would presumably be an easy optimisation but definitely LIKE 'abc%' will allow it to consider a seek (whether it does one will depend on the statistics) (Unless your actual query is using a column rather than constants? in which case you'll get a scan whatever)
Martin Smith
+1  A: 

This is an issue of sargability - when you use a function like left in the where clause, the query can't use the indexes which slows it down.

HLGEM
How would you explain that my last query runs fast? Also, I tried Martin Smith's tip, which would make the matchcode sargable, but it did not help. If this was the case, then putting the right hand side in (SELECT ...) would not help, would it?
ercan
"last query" refers to query 3 here...
ercan
+1  A: 

That means, the right hand side of the where condition gets calculated for every row!

No, you're thinking of correlated subqueries - which this is not.

The last example is fast because the optimizer sees it as a join (due to the SELECT), vs the WHERE clause in the previous examples.

While using a function on a column will render an index on that column useless, cutting down the number of characters on a VARCHAR column (say first 10 of VARCHAR(150)) could be faster due to less to compare. INTs are 4 bytes regardless, but that's not the case for string based data types...

OMG Ponies
So you say that the optimizer doesn't know beforehand how many characters are there going to be before '%', so it decides not to use the index. Indeed, when I force index use in query 4, it runs as fast as query 5! But it's still a mystery to me why query 3 runs fast, although the index on matchcode is not used because of LEFT()...
ercan
OMG Ponies
+3  A: 

Following the update in your question can you look at the two execution plans for your queries #4 and #5 and see whether it does a clustered index scan for one and a non clustered index seek for the other? I wonder if it is because it knows the statistics for the literal at compile time but not for the function call. As it has no idea that only a handful of records will be returned it errs on the side of caution to avoid doing a whole load of bookmark lookups.

If that is the case then does the following help?

SELECT * 
FROM bldb.dbo.blacklist WITH (FORCESEEK)
WHERE matchcode LIKE 
  LEFT(dbo.fn_matchcode('12345','Sesame Street','1','Eddie','Bert'),13) + '%'
Martin Smith
Plan for #4 shows a "Table scan" whereas #5 shows "index seek". `FORCESEEK` was not recognized as a hint, but instead I used `WITH INDEX(indexname)`. It works! See my comment to OMG Ponies' post. You are both right. When I call the function, it is yet unknown where the '%' will be, so optimizer does not take the risk of doing an index seek for a string like 'A%'.
ercan
Ah, `FORCESEEK` is SQL2008 only. Glad you got a hint that worked though! And good point - so far as it knows at compile time the function could return a leading `%` anyway.
Martin Smith
Is it wrong that this resolution made me aroused?
Yellowfog
@Yellow It's only wrong if you didn't upvote it.
Martin Smith
A: 

For a query to run fast, there must be an index for the thing being queried. If you are going to be querying based upon a value computed from one or more columns, you need to have an indexed column containing the result of that computation.

supercat
The OP does have such a column. The `matchcode` column.
Martin Smith