tags:

views:

206

answers:

3

I have two files (f1 and f2) containing some text (or binary data).
How can I quickly find common blocks?

e.g.
f1: ABC DEF
f2: XXABC XEF

output:

common blocks:
length 4: "ABC " in f1@0 and f2@2 length 2: "EF" in f1@5 and f2@8

+2  A: 

This is a great tool for such purposes.: http://sourceforge.net/projects/duplo/

torial
+1  A: 

Wikipedia has some pseudocode for finding the longest common substring between two sequences of data. In your case, you simply extract all common substring from the table that are not prefixes of other common substrings (i.e. maximal common substrings).

Torsten Marek
+1  A: 

The open-source PMD project has a cut-and-paste detector module which is mentioned on this page: http://pmd.sourceforge.net/integrations.html.

David Medinets