views:

33

answers:

1

I'm looking for a solution that could process large blocks of user input text and match it against a set of strings that I've got stored in a database. The only problem is that the strings in the user input text are frequently misspelled. (The strings in the database are spelled correctly)

I know modern search engines suggest results that account for misspellings, but I have not a clue what those algorithms are called or if they even apply to my situation.

Firstly, I need to know the names of those algorithms (or what they are generally called). Secondly, I need to know how to apply them. Any ideas?

+3  A: 

Use libaspell to find misspelled words, then correct it's suggestions with some clustering (k-means ?) algo, or with http://en.wikipedia.org/wiki/Levenshtein_distance (for strings). Your code should also process incomplete non-dictionary words, if you have a parts catalog or scientific book database to search in.

mhambra