views:

561

answers:

3

I know that I can use cmp, diff, etc to compare two files, but what I am looking for is a utility that gives me percentage difference between two files.

if there is no such utility, any algorithm would do fine too. I have read about fuzzy programming, but I have not quite understand it.

+11  A: 

You can use difflib.SequenceMatcher ratio method

From the documentation:

Return a measure of the sequences’ similarity as a float in the range [0, 1].

For example:

from difflib import SequenceMatcher
text1 = open(file1).read()
text2 = open(file2).read()
m = SequenceMatcher(None, text1, text2)
m.ratio()
Nadia Alramli
thanks I did not know that part of the library.
Mohamed
+1  A: 

It's not an exact duplicate, but there's a lot of useful discussion in this quesion:

Determining “Owner” of Text Edited by Multiple Users

brien
+1  A: 

It looks like Linux has a utility called dwdiff that can give percentage differences by using the "-s" flag

http://www.softpanorama.org/Utilities/diff_tools.shtml

brien