views:

51

answers:

3

I need to compare 2 sequences and find an edit distance. Edits can include deletion and insertion operations (with modification weight 1 per symbol), and block move operations (with weight 0.1 per symbol)
For example:
A B C D E F G H
F G H A B C Y D X E
Block FGH was moved here.
Is there any existing algorithm to solve this task efficiently?

+2  A: 

You might try A technique for isolating differences between files (via here):

An algorithm which uses the 'move' operator is described in P. Heckel's 1978 paper

(Sorry for the scribd interface, but I guess the paper has not been OCR'd.)

LarsH
Wow... Scribed is awful. I would rather have adobe acrobat open up in my browser than that monstrosity. No offense to you of course, I've just never had the 'pleasure' of using it, opting for citeseerx and the like.
nlucaroni
@nlucaroni: Yeah I agree. If you can find a link to Heckel's paper (full text) somewhere else on the web, please post it.
LarsH
A: 

Yes; there are many algorithms and theory relating to biology; genome alignment and chromosome rearrangement. Without knowing your data, it's really hard to mention something more specific. I mention pancake sorting as a measure of rearrangement on another stackoverflow post, there are also a few other great options (compression, specifically). Of course that method will not be able to break apart your data into blocks. Dealing with small sequence data you should have no problem generating all groupings.

nlucaroni
A: 

The abstract for Block Edit Models for Approximate String Matching looks promising.

j_random_hacker