tags:

views:

30

answers:

2

Can anyone think of a simple way of highlighting the differences between lines in a text file.

Ideally in Linux

+1  A: 

Levenshtein distance

Wikipedia: Levenshtein distance between two strings is minimum number of operations needed to transform one string into the other, where an operation is an insertion, deletion, or substitution of a single character.

public static int LevenshteinDistance(char[] s1, char[] s2) {
    int s1p = s1.length, s2p = s2.length;
    int[][] num = new int[s1p + 1][s2p + 1];

    // fill arrays
    for (int i = 0; i <= s1p; i++)
        num[i][0] = i;

    for (int i = 0; i <= s2p; i++)
        num[0][i] = i;

    for (int i = 1; i <= s1p; i++)
        for (int j = 1; j <= s2p; j++)
            num[i][j] = Math.min(Math.min(num[i - 1][j] + 1,
                    num[i][j - 1] + 1), num[i - 1][j - 1]
                    + (s1[i - 1] == s2[j - 1] ? 0 : 1));

    return num[s1p][s2p];
}

Sample App in Java

String Diff

alt text

Application uses LCS algorithm to concatenate 2 text inputs into 1. Result will contain minimal set of instructions to make one string for the other. Below the instruction concatenated text is displayed.

Download application: String Diff.jar

Download source: Diff.java

Margus
Thanks Magnus, looks like a good algo to know about/add to the list. Might help me roll my own diff highlighting, but ideally would like to find a library/tool such as that in my answer
HaveAGuess
Linux has a powerful 'diff'-command, i would look into that.
Margus
A: 

http://neil.fraser.name/software/diff_match_patch/svn/trunk/demos/demo_diff.html

.. this look promising, will update this with more info when Ive played more..

HaveAGuess