views:

36

answers:

3

I tried

filecmp.cmp(file1,file2)

but it doesn't work since files are identically except for new line characters. Is there an option for that in filecmp or some other convenience function/library or do I have to read both files line by line and compare those?

+1  A: 

Try the difflib module - it provides classes and functions for comparing sequences.

For your needs, the difflib.Differ class looks interesting.

class difflib.Differ

This is a class for comparing sequences of lines of text, and producing human-readable differences or deltas. Differ uses SequenceMatcher both to compare sequences of lines, and to compare sequences of characters within similar (near-matching) lines.

See the differ example, that compares two texts. The sequences being compared can also be obtained from the readlines() method of file-like objects.

gimel
+2  A: 

I think a simple convenience function like this should do the job:

from itertools import izip

def areFilesIdentical(filename1, filename2):
    with open(filename1, "rtU") as a:
        with open(filename2, "rtU") as b:
            # Note that "all" and "izip" are lazy
            # (will stop at the first line that's not identical)
            return all(myprint() and lineA == lineB
                       for lineA, lineB in izip(a.xreadlines(), b.xreadlines()))
AndiDog
I'm curious what the 't' modifier does.
compie
Well actually it's ignored and you can just write "rU". The file object ensures that all newlines are represented as '\n' with the universal newline mode ("U").
AndiDog
A: 

Looks like you just need to check if files are same or not ignoring whitespace/newlines.

You can use a function like this

def do_cmp(f1, f2):
    bufsize = 8*1024
    fp1 = open(f1, 'rb')
    fp2 = open(f2, 'rb')
    while True:
        b1 = fp1.read(bufsize)
        b2 = fp2.read(bufsize)
        if not is_same(b1, b2):
            return False
        if not b1:
            return True

def is_same(text1, text2):
    return text1.replace("\n","") == text2.replace("\n","")

you can improve is_same so that it matches according to your requirements e.g. you may ignore case too.

Anurag Uniyal
Byte strings don't have a `remove` method. Guess you mean something like `.replace("\r\n", "").replace("\n", "").replace("\n\r", "")`? And this won't work if the '\r' is at the end of one buffer and the '\n' is on the beginning of the next buffer.
AndiDog
@ yes replace, thanks!, point is inside `is_same` OP can do whatever comparison he needs and .replace("\n","").replace("\r","")
Anurag Uniyal