views:

880

answers:

6

I’ve just found out that the execution plan performance between the following two select statements are massively different:

select * from your_large_table
where LEFT(some_string_field, 4) = '2505'

select * from your_large_table
where some_string_field like '2505%'

The execution plans are 98% and 2% respectively. Bit of a difference in speed then. I was actually shocked when I saw it.

I've always done LEFT(xxx) = 'yyy' as it reads well. I actually found this out by checking the LINQ generated SQL against my hand crafted SQL. I assumed the LIKE command would be slower, but is in fact much much faster.

My question is why is the LEFT() slower than the LIKE '%..'. They are afterall identical?

Also, is there a CPU hit by using LEFT()?

+6  A: 

There's a huge impact on using function calls in where clauses as SQL Server must calculate the result for each row. On the other hand, like is a built in language feature which is highly optimized.

Dan Sydner
+3  A: 

If you use a function on a column with an index then the db no longer uses the index (at least with Oracle anyway)
So I am guessing that your example field 'some_string_field' has an index on it which doesn't get used for the query with 'LEFT'

hamishmcn
This isn't entirely true. The index can still be used but maybe in a different way. If the expected number of matches on the predicate is small and the index is much physically smaller than the tabe then an index full or fast full scan could be leveraged.
David Aldridge
Interesting, thanks for the info
hamishmcn
+15  A: 

It looks like the expression LEFT(some_string_field, 4) is evaluated for every row of a full table scan, while the "like" expression will use the index.

Optimizing "like" to use an index if it is a front-anchored pattern is a much easier optimization than analyzing arbitrary expressions involving string functions.

mfx
+1  A: 

Why do you say they are identical? They might solve the same problem, but their approach is different. At least it seems like that...

The query using LEFT optimizes the test, since it already knows about the length of the prefix and etc., so in a C/C++/... program or without an index, an algorithm using LEFT to implement a certain LIKE behavior would be the fastest. But contrasted to most non-declarative languages, on a SQL database, a lot op optimizations are done for you. For example LIKE is probably implemented by first looking for the % sign and if it is noticed that the % is the last char in the string, the query can be optimized much in the same way as you did using LEFT, but directly using an index.

So, indeed I think you were right after all, they probably are identical in their approach. The only difference being that the db server can use an index in the query using LIKE because there is not a function transforming the column value to something unknown in the WHERE clause.

FredV
The '%' sign is a wildcard character for LIKE, Fred.
Kevin Fairchild
Umm I can't disagree, my point was the db probably already optimizes "a like 'xxx%'" to be "left(a,3) = 'xxx'" but that doesn't matter because the database can use the index, so it will always be faster anyway.
FredV
+11  A: 

More generally speaking, you should never use a function on the LEFT side of a WHERE clause in a query. If you do, SQL won't use an index--it has to evaluate the function for every row of the table. The goal is to make sure that your where clause is "Sargable"

Some other examples:

Bad: Select ... WHERE isNull(FullName,'') = 'Ed Jones'
Fixed: Select ... WHERE ((FullName = 'Ed Jones') OR (FullName IS NULL))

Bad: Select ... WHERE SUBSTRING(DealerName,4) = 'Ford'
Fixed: Select ... WHERE DealerName Like 'Ford%'

Bad: Select ... WHERE DateDiff(mm,OrderDate,GetDate()) >= 30
Fixed: Select ... WHERE OrderDate < DateAdd(mm,-30,GetDate()) 

Bad: Select ... WHERE Year(OrderDate) = 2003
Fixed: Select ... WHERE OrderDate >= '2003-1-1' AND OrderDate < '2004-1-1'
BradC
Typeof in the 2nd line, it's not quite the same.
Robert Wagner
Great examples! Thanks to you, I don't have to ask my own question now. :)
Ecyrb
+1  A: 

What happened here is either that the RDBMS is not capable of using an index on the LEFT() predicate and is capable of using it on the LIKE, or it simply made the wrong call in which would be the more appropriate access method.

Firstly, it may be true for some RDBMSs that applying a function to a column prevents an index-based access method from being used, but that is not a universal truth, nor is there any logical reason why it needs to be. An index-based access method (such as Oracle's full index scan or fast full index scan) might be beneficial but in some cases the RDBMS is not capable of the operation in the context of a function-based predicate.

Secondly, the optimiser may simply get the arithmetic wrong in estimating the benefits of the different available access methods. Assuming that the system can perform an index-based access method it has first to make an estimate of the number of rows that will match the predicate, either from statistics on the table, statistics on the column, by sampling the data at parse time, or be using a heuristic rule (eg. "assume 5% of rows will match"). Then it has to assess the relative costs of a full table scan or the available index-based methods. Sometimes it will get the arithmetic wrong, sometimes the statistics will be misleading or innaccurate, and sometimes the heuristic rules will not be appropriate for the data set.

The key point is to be aware of a number of issues:

  1. What operations can your RDBMS support?
  2. What would be the most appropriate operation in the case you are working with?
  3. Is the system's choice correct?
  4. What can be done to either allow the system to perform a more efficient operation (eg. add a missing not null constraint, update the statistics etc)?

In my experience this is not a trivial task, and is often best left to experts. Or on the other hand, just post the problem to Stackoverflow -- some of us find this stuff fascinating, dog help us.

David Aldridge