ansaurus

Question

Can I get a percentage of by how much one file differs from another?

Answer 1

A:

I am not sure how you would want to measure percentages. You could however cook up a script which reads the output of your diff command and somehow calculates percentages. But first you need to know which metric you want to use.

Robert Klemme 2009-08-10 12:40:40

Answer 2

A:

Look up the program "diffstat". It will give you a better idea.

daed 2009-08-10 12:41:30

Answer 3

+1 A:

Since you're diffing binaries, diff or diffstat are not very useful. The notion of "difference" is also not as clear as with line-oriented text files.

One idea is to use a binary diff tool such as bsdiff or xdelta to generate a binary patch with zero compression and then compare the size of the patch to the size of the original.

laalto 2009-08-10 12:45:09

Answer 4

A:

not exactly sure how you want to define "how much different", but you can count the number of items in each directory and divide by total to get percentage

# diff -r /tmp /home | awk -F":" '{_[$1]++}END{for(i in _) print _[i],i}'
74 Only in /tmp
29 Only in /home

the above just prints out the numbers. Define a metric yourself.

ghostdog74 2009-08-10 12:45:18

Just how different each file is from its equivalent on the second path. I actually don't care about the files that are only in one of them at all.

kch 2009-08-10 13:15:28

Answer 5

A:

I guess this script prints some kind of percentage.

#!/bin/sh

file1="$1"
file2="$2"

file1size=$( cat $file1 | wc -c )
file2size=$( cat $file2 | wc -c )

if [ $file1size -lt $file2size ]; then
    size=$file1size
else
    size=$file2size
fi

dc -e "
3k
$( cmp -n $size -l $file1 $file2 | wc -l )
$size
/
100*
p"

Cirno de Bergerac 2009-08-10 13:16:34

ansaurus

tags:

views:

answers:

Can I get a percentage of by how much one file differs from another?

Regarding the metric

related questions