ansaurus

Question

SQL LIKE Performance with only the wildcard (%) as a value

Answer 1

+1 A:

Any DBMS worth its salt would strip out LIKE '%' clauses before even trying to run the query. I'm fairly certain I've seen DB2/z do this in its execution plans.

The prepared statement shouldn't make a difference since it should be turned into real SQL before it gets to the execution engine.

But, as with all optimization questions, measure, don't guess! DBAs exist because they constantly tune the DBMS based on actual data (which changes over time). At a bare minimum, you should time (and get the execution plans) for all variations with suitable static data to see if there's a difference.

I know that queries like:

select c from t where ((1 = 1) or (c = ?))

are optimized to remove the entire where clause before execution (on DB2 anyway and, before you ask, the construct is useful where you need to remove the effect of the where clause but still maintain the parameter placeholder (using BIRT with Javascript to modify the queries for wildcards)).

paxdiablo 2009-10-22 02:24:59

I don't think this is true, especially if the column being compared is NULLable. LIKE '%' should not return those rows where the column is NULL, so stripping the criteria away should not suddenly reintroduce them to the results.

Aaron Bertrand 2009-10-22 02:32:49

The DBMS should (and it *is* "should") be able to tell if the column is not nullable and hence not optimize. In any case, that would render the OPs question moot, since they wouldn't be able to do that either.

paxdiablo 2009-10-22 02:35:03

Answer 2

+1 A:

Depending on how the LIKE predicate is structured and on the field you're testing on, you might need a full table scan. Semantically a '%' might imply a full table scan but Sql Server does all sorts of optimization internally on queries. So the question becomes: Does Sql Server optimize on a LIKE predicate formed with'%' and throws it out of the WHERE clause?

Paul Sasik 2009-10-22 02:28:11

Answer 3

+5 A:

SQL Server will generally see

WHERE City LIKE 'A%'

and treat it as

WHERE City >= 'A' AND City < 'B'

...and happily use an index seek if appropriate. I say 'generally', because I've seen it fail to do this simplification in certain cases.

If someone's trying to do:

WHERE City LIKE '%ville'

...then an index seek will be essentially impossible.

But something as simple as:

WHERE City LIKE '%'

will be considered equivalent to:

WHERE City IS NOT NULL

Rob Farley 2009-10-22 02:35:37

DB2 (at least) has the concept of reverse indexes where '%ville' is easily optimized (by storing the reversed values in the index and internally changing the query to 'elliv%'). You can emulate the same on other DBMS' with an extra column and insert/update triggers.

paxdiablo 2009-10-22 02:38:49

Sure, but then %ville% becomes more complicated. If you're looking for whole words, then FullText searching becomes a nicer option.

Rob Farley 2009-10-22 03:00:31

+1 for pointing out that `LIKE '%'` will only return rows with non-null values.

Martin B 2009-10-22 06:52:03

Answer 4

+3 A:

You can use whatever query analysis the DBMS offers (e.g. EXPLAIN for MySQL, SET SHOWPLAN_ALL ON for MS SQL (or use one of the other methods), EXPLAIN PLAN FOR for Oracle) to see how the query will be executed.

outis 2009-10-22 02:42:14

Answer 5

A:

What if a column has a non-null blank value? Your query will probably match it.

If this is a query for a real world application then try using the free text indexing features of most modern sql databases. The performance issues will become insignificant.

A simple if statement of if (A B) search a b else (A) search a else B search b else tell user they didn't specify anything

is trivial to maintain and becomes much easier to understand instead of making assumptions about the LIKE operator. You are probably going to do that in the UI anyway when you display the results "Your search for A found x" or "Your search for A B found..."

james 2009-10-22 02:46:00

Answer 6

A:

I'm not sure of the value of using a prepared statement with the kind of parameters you're describing. The reason is that you might fool the query optimizer into preparing an execution plan that would be completely wrong depending on which of the parameters were '%'.

For instance, if the statement were prepared with an execution plan using the index on column A, but the parameter for column A turned out to be '%' you may experience poor performance.

Larry Lustig 2009-10-22 02:49:55

Answer 7

A:

a where clause with " like '%'" as the only predicate will behave exactly the same as no where clause at all.

2009-10-22 03:14:14

This is wrong, only non-NULL values would match.

IronGoofy 2009-10-22 10:14:26

thanks for the correction!

2009-10-23 03:45:46

I meant it will behave the same from performance perspective, but I obviously wasn't clear enough. Also, this is also not true in some cases.

2009-10-23 03:46:47

Answer 8

+2 A:

Derby also offers tools for examining the actual query plan that was used, so you can run experiments using Derby and look at the query plan that Derby chose. You can run Derby with -Dderby.language.logQueryPlan=true, and Derby will write the query plan to derby.log, or you can use the RUNTIMESTATISTICS facility, as described here: http://db.apache.org/derby/docs/10.5/tuning/ctundepth853133.html

I'm not sure if Derby will strip out the A LIKE '%' ahead of time, but I also don't think that the presence of that clause will introduce much of a slowdown in the execution speed.

I'd be quite interested to see the actual query plan output that you get in your environment, with and without the A LIKE '%' clause in place.

Bryan Pendleton 2009-10-22 04:34:48

Answer 9

+2 A:

Oracle 10gR2 does not appear to perform a special optimisation for this situation, but it does recognise that LIKE '%' excludes nulls.

create table like_test (col1)
as select cast(dbms_random.string('U',10) as varchar2(10))
from dual
connect by level <= 1000
/
insert into like_test values (null)
/
commit
/

exec dbms_stats.gather_table_stats(user,'like_test')

explain plan for
select count(*)
from   like_test
/
select plan_table_output from table(dbms_xplan.display)
/
explain plan for
select count(*)
from   like_test
where  col1 like '%'
/
select plan_table_output from table(dbms_xplan.display)
/
explain plan for
select count(*)
from   like_test
where  col1 is not null
/
select plan_table_output from table(dbms_xplan.display)
/

... giving ...

Plan hash value: 3733279756

------------------------------------------------------------------------
| Id  | Operation          | Name      | Rows  | Cost (%CPU)| Time     |
------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |           |     1 |     3   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE    |           |     1 |            |          |
|   2 |   TABLE ACCESS FULL| LIKE_TEST |  1001 |     3   (0)| 00:00:01 |
------------------------------------------------------------------------

... and ...

Plan hash value: 3733279756

--------------------------------------------------------------------------------
| Id  | Operation          | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |           |     1 |    10 |     3   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE    |           |     1 |    10 |            |          |
|*  2 |   TABLE ACCESS FULL| LIKE_TEST |  1000 | 10000 |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("COL1" LIKE '%')

... and ...

Plan hash value: 3733279756

--------------------------------------------------------------------------------
| Id  | Operation          | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |           |     1 |    10 |     3   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE    |           |     1 |    10 |            |          |
|*  2 |   TABLE ACCESS FULL| LIKE_TEST |  1000 | 10000 |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("COL1" IS NOT NULL)

Note the cardinality (rows) on the TABLE ACCESS FULL line

David Aldridge 2009-10-22 06:47:33

Answer 10

+1 A:

One aspect that I think is missing from the discussion is the fact that the OP wants to use a prepared statement. At the time the statement is prepared, the database/optimizer will not be able to work out the simplifications others have mentioned and so won't be able to optimize away the a like '%' as the actual value will not be known at prepare time.

Therefore:

when using prepared statements, have four different statements available (0, only a, only b, both) and use the appropriate one when needed
see if you get better performance when you don't use a prepared statement when sticking to just one statement (although then it would be pretty easy to not include 'empty' conditions)

IronGoofy 2009-10-22 10:13:18

Answer 11

+1 A:

I was hoping there would be a textbook answer to this but it sounds like it will largely vary with different database types. Most of the responses indicated that I should run a test so that is exactly what I did.

My application primarily targets the Derby, MS SQL and Oracle databases. Since derby can be run embedded and is easy to set up, I tested the performance on that first. The results were surprising. I tested the worst case scenario against a fairly large table. I ran the test 1000 times and averaged the results.

Query 1:

SELECT * FROM TableName

Query 2 (With values of a="%" and b="%"):

SELECT * FROM TableName WHERE a LIKE ? AND b LIKE ?

Query 1 average time: 178ms

Query 2 average time: 181ms

So performance on derby is almost the same between the two queries.

Chris Dail 2009-10-26 16:35:51

ansaurus

tags:

views:

answers:

SQL LIKE Performance with only the wildcard (%) as a value

related questions