views:

373

answers:

3

Hello. This may be a hard question to answer but I'm researching something and I was wondering if anyone knew of "lesser known" string similarity metrics (see this page for examples of well-known ones). I've been to wikipedia and Sourceforge has a nice library called Simmetrics with a bunch of string metric algorithms. Has anyone done some research or has found some string algorithm that called your attention as not much used?

Thank you.

+1  A: 

There are also the class of phonetic algorithms (such as Soundex) that might add to your list.

JP Alioto
+1  A: 

This page (LingPipe) gives some tips about string comparisons. It talks about Damerau-Levenstein distance, Needlman-Wunsch algorithm, Jaccard distance, Jaro-Winkler distance, TF/IDF distance. Distance understood as similarity between two strings.

At the end of the page, it gives references and it also provides a Java implementation ready to be used (download & license)

Guido
+1  A: 

Check out http://us.php.net/manual/en/function.levenshtein.php including all the "See Also" references and all the user comments.

Chloe