tags:

views:

141

answers:

3

In the Levenshtein Distance algorithm, what does this line do?:

    d[i][j] = Minimum (d[i-1][j]+1, d[i][j-1]+1, d[i-1][j-1] + cost);

Although it gets the minimum of all of those values, why is cost added to the end, and why do we have + 1 at the end of each array indexer (first two parameters)?

A: 

if you readed further, you'd know: We can give different penalty costs to insertion, deletion and substitution. We can also give penalty costs that depend on which characters are inserted, deleted or substituted.

you'd e.g. include some >0 cost value in the deletion part of the formula, if you suppose than a letter deletion makes a bigger difference than a letter replacement

mykhal
+1  A: 

From the article:

                (
                  d[i-1, j] + 1,  // deletion
                  d[i, j-1] + 1,  // insertion
                  d[i-1, j-1] + 1 // substitution
                )

The algorithm's purpose is to find the cheapest path to convert one string (or list) to another string. The "cost" for any given operation is considered in the line you've referenced. In yours, "deletes" are considered to be a cost of 1, "inserts" also 1, and "substitution" at a variable cost.

James Kolpack
+1  A: 

Here is a formula:

alt text

m(a,b) = if a==b then 0 else 1

The line with "min" is the recursive formula itself, the others are non-recursive cases to which the recursion leads.

Your line is using "caching" the results in array. Try to look at it as at:

 d(i, j) = Minimum (d(i-1, j)+1, d(i, j-1)+1, d(i-1, j-1) + cost);

cost is m(a,b), and it's zero if a==b and one in the other case

valya