views:

400

answers:

2

I'm using Lawrence Philips Double-Metaphone algorithm with great success, but I have found the odd "unexpected result" for some combinations.

Does anyone else have additions or changes to the algorithm for other parts of it they wouldn't mind sharing, or just the combinations that they've found that do not work as expected.

eg. I had issues between:

  • Peashill and Bushley. (both match with PXL)
  • Rockliffe and Rockcliffe (RKLF and RKKL)
+2  A: 

All Soundex, Metaphone and variant schemes are occasionally going to give results that aren't identical to what you expect. This is unavoidable - they can be regarded as more or less simple hash algorithms with special information preserving properties, and will sometimes produce collisions when you'd rather they didn't, and will sometimes produce differences when you'd rather they didn't.

One possible way of improving things is using 'synonym rings'. This basically produces lists of words that should be regarded as synonyms, independent of the spelling. I encountered them in the context of name matching. For example, variants on Chaudri included:

CHAUDARY CHAUDERI CHAUDERY CHAUDHARY CHAUDHERI CHAUDHERY CHAUDHRI CHAUDHRY CHAUDHURI CHAUDHURY CHAUDHY CHAUDREY CHAUDRI CHAUDRY CHAUDURI CHAWDHARY CHAWDHRY CHAWDHURY CHDRY CHODARY CHODHARI CHODHOURY CHODHRY CHODREY CHODRY CHODURY CHOUDARI CHOUDARY CHOUDERY CHOUDHARI CHOUDHARY CHOUDHERY CHOUDHOURY CHOUDHRI CHOUDHRY CHOUDHURI CHOUDHURY CHOUDREY CHOUDRI CHOUDRY CHOUDURY CHOUWDHRY CHOWDARI CHOWDARY CHOWDHARY CHOWDHERY CHOWDHRI CHOWDHRY CHOWDHURI CHOWDHURRYY CHOWDHURY CHOWDORY CHOWDRAY CHOWDREY CHOWDRI CHOWDRURY CHOWDRY CHOWDURI CHOWDURY CHUDARY CHUDHRY CHUDORY COWDHURY

Jonathan Leffler
This is a good suggestion. I have used such a workaround that was dynamic in the past: in the UI we had a "not a good match" option when searching, which added the entry into an matching exceptions table. We also allowed the keying of a variant, which (once a match was found) also was stored.
Godeke
A: 

regular metaphone is returning a difference between Peashill and Bushley

Peashill PXL Bushley BXL