views:

391

answers:

3

I have a site that lists movies. Naturally people make spelling mistakes when searching for movies, and of course there is the fact that some movies have apostrophes, use letters to spell out numbers in the title, etc.

How do I get my search script to overlook these errors? Probably need something that's a little more intelligent than WHERE mov_title LIKE '%keyword%'

It was suggested that I use a fulltext search engine, but all of those things look really complicated, and I feel that building them into my application will be like hell on earth. If I do have to use one... whats the least invasive one, that will be most painless to implement into existing code.

+1  A: 

I think you'll have to implement an external fulltext search engine. MySQL just isn't good at fulltext search. I'd say you should give Lucene a go (tutorials). Zend Framework has an API that plugs into Lucene, making it easier to learn and utilize.

PatrikAkerstrand
Does Lucene has typo recognition built in?What about sphinxsearch?
Yegor
Yes, Lucene supports Fuzzy search queries. The algorithm is based on Levenshtein distance. The Java API looks like this: http://lucene.apache.org/java/2_2_0/api/org/apache/lucene/search/FuzzyQuery.html
PatrikAkerstrand
Is there no other way except using Lucene?
Yegor
A: 

I've used neither php nor mysql, but an alternative to full text search might be soundex searches.

Antony
A: 

Presuming that you use MySQL - MySQL has no in-built functionality that is capable of doing this.

This means you will have to implement a full-text search yourself, or use a third party full text search tool.

  • If you implement it yourself, you should look into the metaphone or double metaphone algorithms (I'd recommend them over soundex, which is not nearly as good at this type of task), to store phoenetic representations of all your words. However, building your own full text search is no task for the faint-hearted. Don't attempt it if you don't consider yourself a database wizard.
  • If you want a third party tool, Lucene is the way to go. It is ported into tons of different languages/platforms including PHP - you don't have to use Java.
thomasrutter