views:

266

answers:

2

If your goal is to test if a string exists in a MySQL column (of type 'varchar', 'text', 'blob', etc) which of the following is faster / more efficient / better to use, and why?

Or, is there some other method that tops either of these?

INSTR( columnname, 'mystring' ) > 0

vs

columnname LIKE '%mystring%'
+6  A: 

FULLTEXT searches are absolutely going to be faster, as kibibu noted in the comments above.

However:

mysql> select COUNT(ID) FROM table WHERE INSTR(Name,'search') > 0;
+-----------+
| COUNT(ID) |
+-----------+
|     40735 | 
+-----------+
1 row in set (5.54 sec)

mysql> select COUNT(ID) FROM table WHERE Name LIKE '%search%';
+-----------+
| COUNT(ID) |
+-----------+
|     40735 | 
+-----------+
1 row in set (5.54 sec)

In my tests, they perform exactly the same. They are both case-insensitive, and generally they perform full-table scans, a general no-no when dealing with high-performance MySQL.

Unless you are doing a prefix search on an indexed column:

mysql> select COUNT(ID) FROM table WHERE Name LIKE 'search%';
+-----------+
| COUNT(ID) |
+-----------+
|         7 | 
+-----------+
1 row in set (3.88 sec)

In which case, the LIKE with only a suffix wildcard is much faster.

razzed
+1 for actually doing the experiment! Too much performance "wisdom" is based on instinct
kibibu
+1  A: 

In the case of a "front wilcard" (i.e. a "LIKE '%...'" predicate) as seems to be the case here, INSTR and LIKE should perform roughly the same.

When the wildcard is not a "front wildcard", the LIKE approach should be faster, unless the wildcard is not very selective.

The reason why the type of wildcard and its selectivity matter is that a predicate with INSTR() will systematically result in a table scan (SQL cannot make any assumptions about the semantics of INSTR), whereby SQL can leverage its understanding of the semantics of the LIKE predicate to maybe use an index to help it only test a reduced set of possible matches.

As suggested in comment under the question itself, a Full Text index will be much faster. The difference depends on the specific distribution of words within the text, and also the overall table size, etc. but expect anything from twice as fast to maybe 10 times as fast.

A possible downside of using at fulltext index, in addition to the general overhead for creating such an index, is that unless one is very careful in configuring this index (ex: defining the stop word list, using specific search syntax to avoid inflectional forms and the like...), there may be cases where the results provided by FullText will not be as expected. For example, searching for a "SAW" (a tool to cut wood), one may get a lot of hits for records including the verb "to see", in its various conjugated forms.
Of course, these linguistic-aware features of fulltext indexes can typically be overridden and also one may consider that such features are effectively a advantage, not a drawback. I just mention this here since we're comparing this to a plain wildcard search.

mjv
Thank you, very informative.
Grekker