tags:

views:

421

answers:

8

I'm looking for an efficient way (using PHP with a Mysql Database) to suggest alternative spelling for a query.

I know I can use services such as Yahoo's Spelling Suggestion but I want the suggestions to be based on what is currently available in the database.

For example: The user has to fill a form with a "City" field, and I want to make sure that everyone will use the same spelling for said city, (so I don't end up with people filling in "Pitsburgh" when what they mean is "Pittsburgh" ).

This was only an example but, basically I want to search what is already in the database for entries where the spelling is really close to what the user entered...

Any algorithm, tutorials or ideas on how to achieve this?

+1  A: 

I would do it as the user types and suggest by prefix (ala Google Suggest). A trie would be nice for this. It wouldn't help to correct misspelled first letters, but those are pretty rare.

perimosocordiae
AutoSuggest is definitely an interesting alternative! I'm pretty sure I considered it already and rejected the idea for a reason, but right now (at 4:40am) I can't seem to remember that reason :)
silvertab
A: 

Please, take a look to the Yahoo! UI Library Autocomplete Component. I think it is just what you're looking for. The section "Using DataSources" explains how to use different kind of data sources, including server side based ones like yours.

Guido
A: 

Have a look at Javascript Examples it lists 13 different autocompleting field code.

I've used something similar on one of my sites, I essentially have a div layer set up under the text box, as a user types this fires of an Ajax based HTTP request to my SQL query script which updates each letter they type. The div gets updated with any matching DB entries which the user can click on to select.

Katy
+1  A: 

MySQL has a built-in function to find the Levenshtein edit distance, it's quite slow though. I'd use the auto-complete function offered above, or simply edit entries after-the-fact every week or so.

MattW.
I wasn't aware of Levenshtein distance (which is also available as a php function!) Thanks for that! :)
silvertab
+1  A: 

Maybe this will help http://jquery.bassistance.de/autocomplete/demo/ It uses JQuery (client side) and php (serverside). The example feeds from an array but can be easily modified so it will use a MySQL database.

daniels
+1  A: 

Spelling alternatives are often implemented by using the Levenshtein distance between two words (the one the user typed, en the one inside, for example, your database)

here is the pseudocode for the algorithm (from wikipedia):

int LevenshteinDistance(char s[1..m], char t[1..n])
   // d is a table with m+1 rows and n+1 columns
   declare int d[0..m, 0..n]

   for i from 0 to m
       d[i, 0] := i
   for j from 0 to n
       d[0, j] := j

   for i from 1 to m
       for j from 1 to n
       {
           if s[i] = t[j] then cost := 0
                          else cost := 1
           d[i, j] := minimum(
                                d[i-1, j] + 1,     // deletion
                                d[i, j-1] + 1,     // insertion
                                d[i-1, j-1] + cost   // substitution
                            )
       }

   return d[m, n]

and here you can find the real implementation for all sorts of languages: http://en.wikibooks.org/wiki/Algorithm_implementation/Strings/Levenshtein_distance

Sven
+1  A: 

I've used the pspell http://uk.php.net/pspell package to do this. Take the search term, check the spelling. If its not OK, PSPELL will make suggestions.

You can even run the suggestions though your search, count the results, and then say: Your search for "foo" returned 0 results. Did you mean "baz" (12 results) or "bar" (3 result).

If you are worried about performance, only do this when a search returns 0 results.

Buzz
A: 

I believe SoundEx is a better fit than Levenshtein distance.

SoundEx is a function that produces a hash of a word/phrase based on the sound it would make in English. It is great for helping people who can't spell match the canonical spelling.

I have used it very successfully to find when two people registered the same company in a database with slightly different variants on the name.

SoundEx is built into MySql. Here is one tutorial on its use.

Oddthinking