tags:

views:

175

answers:

2

I'd like to align two lists in a similar way to what difflib.Differ would do except I want to be able to define a match function for comparing items, not just use string equality, and preferably a match function that can return a number between 0.0 and 1.0, not just a boolean.

So, for example, say I had the two lists:

L1 = [('A', 1), ('B', 3), ('C', 7)]
L2 = ['A', 'b', 'C']

and I want to be able to write a match function like this:

def match(item1, item2):
    if item1[0] == item2:
        return 1.0
    elif item1[0].lower() == item2.lower():
        return 0.5
    else:
        return 0.0

and then do:

d = Differ(match_func=match)
d.compare(L1, L2)

and have it diff using the match function. Like difflib, I'd rather the algorithm gave more intuitive Ratcliff-Obershelp type results rather than a purely minimal Levenshtein distance.

A: 

I recently ran across a discussion of an algorithm called patience diff that sounds rather simple. You could try implementing that yourself, and then of course you can have it use whatever comparison algorithm you like.

Carl Smotricz
if it's used in bzr that may mean a Python implementation already exists
James Tauber
+3  A: 

I just wrote this implementation of Needleman-Wunsch and it seems to do what I want:

def nw_align(a, b, replace_func, insert, delete):

    ZERO, LEFT, UP, DIAGONAL = 0, 1, 2, 3

    len_a = len(a)
    len_b = len(b)

    matrix = [[(0, ZERO) for x in range(len_b + 1)] for y in range(len_a + 1)]

    for i in range(len_a + 1):
        matrix[i][0] = (insert * i, UP)

    for j in range(len_b + 1):
        matrix[0][j] = (delete * j, LEFT)

    for i in range(1, len_a + 1):
        for j in range(1, len_b + 1):
            replace = replace_func(a[i - 1], b[j - 1])
            matrix[i][j] = max([
                (matrix[i - 1][j - 1][0] + replace, DIAGONAL),
                (matrix[i][j - 1][0] + insert, LEFT),
                (matrix[i - 1][j][0] + delete, UP)
            ])

    i, j = len_a, len_b
    align_a = ""
    align_b = ""

    while (i, j) != (0, 0):
        if matrix[i][j][1] == DIAGONAL:
            align_a += a[i - 1]
            align_b += b[j - 1]
            i -= 1
            j -= 1
        elif matrix[i][j][1] == LEFT:
            align_a += "-"
            align_b += b[j - 1]
            j -= 1
        else: # UP
            align_a += a[i - 1]
            align_b += "-"
            i -= 1

    return align_a[::-1], align_b[::-1]
James Tauber
+1 for self follow-up, it's appreciated
msw