tags:

views:

66

answers:

1

Are there examples of algorithms for determining the edit distance between 2 strings when there are extra primitive operations (the standard being insert, delete, transpose & substitute a single character) which operate on a whole substring. Examples of possible extra primitive operations are:

1) The duplicate function - which copies any substring and inserts it where needed

2) The move function - which moves any substring to a new location

Using these, if d & D are the Levenstein distance but D also includes 1) & 2), d("Sheep", "SheepBeep") = 4 (as 4 inserts must be made), but D("Sheep", "SheepBeep") = 2 (insert a "B" then duplicate the "eep"). Similarly, d("CarDog", "DogCar") = 6 but D("CarDog", "DogCar") = 1 by 2).

Are there (simple) modifications that can be made to the Levenstein Distance algorithm in order to achieve this?

+1  A: 

Having only insert, delete and move results in an NP-hard problem. It seems very unlikely that adding duplicate, transpose, and substitute makes it easier again. So polynomial-time approaches like the Levenshtein dynamic programming are very unlikely to work.

Similar problems have also been considered in Bioinformatics under terms like "genome rearrangement" and "translocation distance".

Falk Hüffner