tags:

views:

2966

answers:

6

I would need to perform Diffs between Java strings. I would like to be able to rebuild a string from the original string and diff versions. Does anyone has done this in Java? What library do you use?

String a1; // This can be a long text
String a2; // ej. above text with spelling corrections
String a3; // ej. above text with spelling corrections and an additional sentence

Diff diff = new Diff();
String differences_a1_a2 = Diff.getDifferences(a,changed_a);
String differences_a2_a3 = Diff.getDifferences(a,changed_a);    
String[] diffs = new String[]{a,differences_a1_a2,differences_a2_a3};
String new_a3 = Diff.build(diffs);
a3.equals(new_a3); // this is true
+2  A: 

Apache Commons has String diff

org.apache.commons.lang.StringUtils

StringUtils.difference("foobar", "foo");
Paul Whelan
It returns the remainder of the second String, starting from where it's different from the first. Which is not efficient enough for me since i would be working with big texts. See: StringUtils.difference("ab", "abxyz") -> "xyz"StringUtils.difference("ab", "xyzab") -> "xyzab";
Sergio del Amo
+1  A: 

Use the Levenshtein distance and extract the edit logs from the matrix the algorithm builds up. The Wikipedia article links to a couple of implementations, I'm sure there's a Java implementation among in.

Levenshtein is a special case of the Longest Common Subsequence algorithm, you might also want to have a look at that.

Torsten Marek
+9  A: 

This library seems to do the trick: google-diff-match-patch. It can create a patch string from differences and allow to reapply the patch.

bernardn
+2  A: 

As Torsten Says you can use

org.apache.commons.lang.StringUtils;

System.err.println(StringUtils.getLevenshteinDistance("foobar", "bar"));
Paul Whelan
A: 

If you need to deal with differences between big amounts of data and have the differences efficiently compressed, you could try a Java implementation of xdelta, which in turn implements RFC 3284 (VCDIFF) for binary diffs (should work with strings too).

Alexander
A: 

The java diff utills library might also be usefull.

iobit