views:

1017

answers:

4

What are the limitations or gotchas or antipatterns or pitfalls?

It seems pretty attractive, apparently you can create a search engine with almost no work. But it cannot be without its problems...

what are your experiences?

+2  A: 

For large tables, you will need to increase your buffer size and cache limit in your MySQL config file.

Also the MATCH() columns you use in the search need to be the same as the columns in the index.

Ólafur Waage
+2  A: 

In my opinion, the greatest drawback is that the MySQL full text indexing is limited to MyISAM tables. As oppsosed to InnoDB tables, those lack a lot of important features, e.g. transactions.

cg
You can work around this by creating a slave machine. The table type on the slave doesn't have to match the master, which means your innodb master could have a myisam slave with full-text searching.http://dev.mysql.com/doc/refman/5.1/en/replication-solutions-diffengines.html
txyoji
Thanks for the hint, txyoji, I wasn't aware of that. Might be an option to keep in mind.
cg
+3  A: 

it cannot be without its problems...

It certainly isn't!

Any search term composed purely of blocked words will silently fail. Words can be blocked due to min/max length restrictions and/or the stopword file.

I found the default stopword file far too aggressive, it was preventing many valid searches. Also the default minimum length of 4 was kicking in very often for acronyms people might want to search for. I reduced the ft_min_word_len to 3 and removed the stoplist completely (ft_stopword_file=''). Doc: http://dev.mysql.com/doc/refman/5.1/en/fulltext-fine-tuning.html

You could also examine the search query to see if it contains only <4-letter words, and fall back to a LIKE search in that case. There's no such easy way to get around the stoplist at an application level.

The selection of ‘word characters’ may not meet your needs, and it's tricky to change. For example searching for “Terry” won't match “Terry's”. In general there is no support for any kind of stemming, so “biscuit” won't match “biscuits” either.

Finally, as cg mentioned, there is no support for InnoDB. In this day and age, you don't want to be putting all your data in a MyISAM table.

If you have the storage to spare, what you can do is put the main, canonical version of the data in an InnoDB table, and then create a separate MyISAM table that contains a copy of the freetext content, purely for use as searchbait. You do have to update both tables on a change, but if the MyISAM table loses integrity then at least you only lose the ability to search over the rows concerned, instead of bumming up the real live data and getting application errors.

You can then, if you have the cycles to spare, implement your own text processing on the searchbait and query words to get around some of the above limitations. For example you can escape characters you want to be word-characters, remove characters you don't want to be word-characters, and perform simple manual English stemming.

bobince
A: 

I just want to double check, there is no way you can use stemming with mysql 5.0? Wow thats awful...

Arnold