E.g. the Soundex algorithm is optimized for English. Is there a more universal algorithm that would apply across large families of languages?
+4
A:
SOUNDEX is indeed English-oriented. Two others that take a wider variety of phonetic differences into account are: Double Metaphone and NYSIIS.
They produce encodings into a much larger possible space than SOUNDEX does. Double Metaphone, specifically, includes reductions with the express purpose of handling alternate pronunciations based on more languages than English.
I did a presentation on fuzzy string matching recently, the slides may be helpful.
Kyle Burton
2008-09-24 15:51:59
The link to your slides is broken (404)
John Machin
2009-09-26 05:29:25