I'm looking to implement fuzzy search for a small PHP/MySQL application. Specifically, I have a database with about 2400 records (records added at a rate of about 600 per year, so it's a small database). The three fields of interest are street address, last name and date. I want to be able to search by one of those fields, and essentially have tolerance for spelling/character errors. i.e., an address of "123 Main Street" should also match "123 Main St", "123 Main St.", "123 Mian St", "123 Man St", "132 Main St", etc. and likewise for name and date.
The main issues I have with answers to other similar questions:
- It's impossible to define synonyms for every possible incorrect spelling, forget doing so for dates and names.
- Lucene, etc. seems very heavy-weight for such a limited search data set (call it a maximum of 5,000 records, 3 fields per record).
- Just doing something with wildcards doesn't seem logical with all of the possible spelling errors.
Any suggestions? I know it isn't going to be possible to do natively with MySQL, but since the data set is so limited, I'd like to keep it relatively simple... perhaps a PHP class that gets all of the records from the DB, uses some sort of comparison algorithm, and returns the IDs of the similar records?
Thanks, Jason