tags:

views:

1221

answers:

3

I am trying to compare 2 text files and output the first string in the comparison file that does not match but am having difficulty since I am very new to python. Can anybody please give me a sample way to use this module.

When I try something like:

result = difflib.SequenceMatcher(None, testFile, comparisonFile)

I get an error saying object of type 'file' has no len.

A: 

Are you sure both files exist ?

Just tested it and i get a perfect result.

To get the results i use something like:

import difflib

diff=difflib.ndiff(open(testFile).readlines(), open(comparisonFile).readlines())

try:
    while 1:
        print diff.next(),
except:
    pass

the first character of each line indicates if they are different: eg.: '+' means the following line has been added, etc.

RSabet
oops, you're right silly mistake. But I'm still not sure how to get the data I need out of result. How do I even know if they differ or not? How can I get the first string that differs? Sorry lots of questions :(
outsyncof
+4  A: 

For starters, you need to pass strings to difflib.SequenceMatcher, not files:

# Like so
difflib.SequenceMatcher(None, str1, str2)

# Or just read the files in
difflib.SequenceMatcher(None, file1.read(), file2.read())

That'll fix your error anyway. To get the first non-matching string, I'll direct you to the wonderful world of difflib documentation.

Triptych
@OP: In addition to the docs, have a look at Doug Hellmann's excellent Python module-of-the-week difflib entry: http://blog.doughellmann.com/2007/10/pymotw-difflib.html
Adam Bernier
@Adam - thanks for the link - I'll check it out.
Triptych
+1  A: 

It sounds like you may not need difflib at all. If you're comparing line by line, try something like this:

test_lines = open("test.txt").readlines()
correct_lines = open("correct.txt").readlines()

for test, correct in zip(test_lines, correct_lines):
    if test != correct:
        print "Oh no! Expected %r; got %r." % (correct, test)
        break
else:
    len_diff = len(test_lines) - len(correct_lines)
    if len_diff > 0:
        print "Test file had too much data."
    elif len_diff < 0:
        print "Test file had too little data."
    else:
        print "Everything was correct!"
Filip Salomonsson
you don't need readlines there, zip can do with file handlers too
SilentGhost
won't this break if the files have the same amount of lines but different content?
outsyncof