views:

204

answers:

3

Hi,

I have a table of strings in my database. I select on of them.

How I can find the most similar string in the rest of the table ?

+1  A: 

I think you are looking for the levenstein distance. The levenstein distance between 2 string, is how much adding/deleting/modifying char is needed to make the string equals.

Here's an implementation in Ruby

Clement Herreman
A: 

If you are after the phonetic similarity between two strings, you could use the SQL soundex function (there is also metaphone in Oracle).

This will convert the string passed to it into a 4 digit code (1 letter, 3 numbers iirc) which represents the phonetic sound of the word(s).

If you do this to both of the strings you are comparing, if they are phonetically similar then you can match the codes.

hermiod
A: 

Here is a simple implementation of the Levenshtein Distance Algorithm in Ruby :

def levenshtein(a, b)
  case
    when a.empty?: b.length
    when b.empty?: a.length
    else [(a[0] == b[0] ? 0 : 1) + levenshtein(a[1..-1], b[1..-1]),
          1 + levenshtein(a[1..-1], b),
          1 + levenshtein(a, b[1..-1])].min
  end
end

The Levenshtein distance between two strings is given by the minimum number of operations needed to transform one string into the other, where an operation is an insertion, deletion, or substitution of a single character.

Andreas Grech