I want to compare several strings to each other, and find the ones that are the most similar. I was wondering if there is any library, method or best practice that would return me which strings are more similar to other strings. For example:
- "The quick fox jumped" -> "The fox jumped"
- "The quick fox jumped" -> "The fox"
This comparison would return that the first is more similar than the second.
I guess I need some method such as:
double similarityIndex(String s1, String s2)
Is there such a thing somewhere?
EDIT: Why am I doing this? I am writing a script that compares the output of a MS Project file to the output of some legacy system that handles tasks. Because the legacy system has a very limited field width, when the values are added the descriptions are abbreviated. I want some semi-automated way to find which entries from MS Project are similar to the entries on the system so I can get the generated keys. It has drawbacks, as it has to be still manually checked, but it would save a lot of work