views:

194

answers:

4

Hello,

I'm looking for an algorithm that takes as arguments two strings, source and destination, and returns the steps required to transform the source string to the destination. Something that takes Levenshtein distance one step farther.

E.g.,

Input: source "abc", dest "abbc"
Output: insert 'b' at position 1 in source

Input: source "abc", dest "ac"
Output: delete 'b' at position 1 in source

Thanks very much.

+1  A: 

Take a look at the diff algorithms on Wikipedia.

TskTsk
A: 

You might try going through first and lining up all the letters in pairs. After you've paired as many as possible, the inserts and deletes should be obvious.

abcde
| /|
acbdd

So you remove the b & e and add a b & d

Remember the lines can't cross.

Bill K
actually, your method won't guarantee an optimal solution. the most optimal solution may be to do the unintuitive and make to substitutions rather than one insert, for instance. it depends on how you weight the operations (insert and delete are usually very expensive). the well-known algorithms will take this into account. they will usually rely on a dynamic programming solution.
San Jacinto
You are correct. I was going under the assumption that this was a school assignment and simplicity would be more valuable than accuracy. If you want accuracy, scan for the largest chunk that matches, then take what remains on either side of that chunk (for both strings) and recurse.
Bill K
A: 

I would try to find a way to send it to an existing well-tested diff utility and use the results of that diff, e.g. diff -e or diff -n.

shoover
+2  A: 

Just use the algorithm as shown on wikipedia, understand it and make the modifications that are necessary. I does solve your problem, you probably just didn't know it, and didn't record your answer along the way.

Styggentorsken
Will do, much appreciated.
kujawk