views:

243

answers:

2

1) Why do we add 1 on these line?

    d[i-1, j] + 1, // deletion 
    d[i, j-1] + 1, // insertion 

The line

if s[i] = t[j] then cost := 0

        else cost := 1 

should take into account deleted/lower word lengths, or am I missing something?

2) Also, the comments state deletion and insertion. Am I right in thinking that it's checking for deleted characters in both words (the integers j/i representing the length of words), because a lower value will represent deleted characters.

The code used is here (because it is pseudo code and I have no language specific issues, this thread is not in any language category):

http://www.iterasi.net/openviewer.aspx?sqrlitid=z0cloj7xhk-ce0f72v4cjq

+1  A: 

1) These lines compute the distance in the case of deletion, in the case of insertion, and the one using "cost" in case of a substitution...

deletion and insertion effectively count as "1" in the distance calculation, hence the +1.

We can believe there was a substitution only if the characters are different hence the "cost=0" if both chars are equal...

The new distance is then the minimum distance between these 3 hypothesis so you don't always add 1 ...

2) if I compute the distance between "FooBar" and "FoBaWhatever" I have some character deletions even if the second string is longer than the first one ...

Of course if the second string is shorter than the second ( FooBar -> FoBa ) I will find some deletions but cannot know in advance where they are...

siukurnin
+1  A: 
S.Lott
Yeah I did actually read that link. Good answer.However, one last thing:In the minimum function, there is +1 for a cell and +cost for a cell. Surely 1 and cost are the same value (1) as cost is never greater than 1 and is not 0 as that would result in the if statement executing (if cost == 0 etc). I don't understand this logic?
dotnetdev
No. The cost is not always 1. It can be much, much larger than one if the adjacent letters aren't good matches. When you first start, you assume that the last letter of an n-character word is the result of n insertions; It's cost is initially n until your comparison shows it to be less because some characters actually matched.
S.Lott