views:

55

answers:

3

Hello,

my SQL query
SELECT keyword
FROM table
WHERE MATCH (keyword)
AGAINST ('eco*' IN BOOLEAN MODE);

matches cells with these words:economy,ecology,echoscopy(why?),echo(why?) etc.

other SQL query
SELECT keyword
FROM table
WHERE MATCH (keyword)
AGAINST ('eci*' IN BOOLEAN MODE);

matches cell with the word:echidna.

Yet both queries don't match word ectoplasm.

Why does echo, echoscopy match 'eco*' and echidna matches 'eci*'?

I see key element in this problem being letter combination "ch".

Why does it work this way and how can I avoid this kind of matching?

A: 

Maybe you can try this

SELECT keyword FROM table WHERE keyword LIKE 'eco%';
Sorry, can't use LIKE - it's much too slow.
Adomas
+1  A: 

The reason that it matches is that MATCH ... AGAINST is using regular expressions, and the * means, that the preceding char ("o") can be there from 0 to 9999999999999999999^ times. What you meant to match is

eco.*

Will match "eco" and "ecology" but not "echo".

eco.+

Will match "ecology" and "eco system" but not "eco" nor "echo".

Jan.
MATCH ... AGAINST does not use regular expressions, it does full text search. See Adomas's answer for the real explanation of what happened.
Avi
Did you notice Adoma is the OP? So I guess he already "noticed".. Regards.. :-)
Jan.
+1  A: 

The problem (feature?) was in the collation. "c" and "ch" were treated equal because of utf8_lithuanian_ci collation.

Changing collation to utf8_unicode_ci fixed the issue.

Adomas
Very interesting. Collation is definitely something to watch out for in MySQL...
Avi