I need to compare 2 sequences and find an edit distance. Edits can include deletion and insertion operations (with modification weight 1 per symbol), and block move operations (with weight 0.1 per symbol)
For example:
A B C D E F G H
F G H A B C Y D X E
Block FGH was moved here.
Is there any existing algorithm to solve this task efficiently?
views:
51answers:
3You might try A technique for isolating differences between files (via here):
An algorithm which uses the 'move' operator is described in P. Heckel's 1978 paper
(Sorry for the scribd interface, but I guess the paper has not been OCR'd.)
Yes; there are many algorithms and theory relating to biology; genome alignment and chromosome rearrangement. Without knowing your data, it's really hard to mention something more specific. I mention pancake sorting as a measure of rearrangement on another stackoverflow post, there are also a few other great options (compression, specifically). Of course that method will not be able to break apart your data into blocks. Dealing with small sequence data you should have no problem generating all groupings.
The abstract for Block Edit Models for Approximate String Matching looks promising.