views:

457

answers:

3

Hi all,

I need to work with large files and must find differences between two. And I don't need the different bits, but the number of differences.

For the differ rows I come up with

diff --suppress-common-lines --speed-large-files -y File1 File2 | wc -l

And it works, but is there a better way to do it?

And how to count the exact number of differences (with standard tools like bash, diff, awk, sed some old version of perl)?

Thanks in advance

A: 

This thread seems to discuss your question.

luvieere
Care to copy a line of the solution you'd use? The URL might outdate one day.
Vytautas Shaltenis
Downvote: I mentioned, I'm on linux. Linux's GNU diff, which (by default) uses different output than Solaris, and by default shows the differ line from both files, so using any solutions recommended at Unix.com forums should give a bad answer.
Zsolt Botykai
Zsolt, I don't read you. Did you even look at that thread and what's suggested there? Just tested with GNU diff 2.8.1, works fine for me.
Vytautas Shaltenis
A: 
diff -U 0 file1 file2 | grep -v ^@ | wc -l

That minus 2 for the two file names at the top of the diff listing. Unified format is probably a bit faster than side-by-side format.

John Kugelman
A: 

If using Linux/Unix, what about comm -1 file1 file2 to print lines in file1 that aren't in file2, comm -1 file1 file2 | wc -l to count them, and similarly for comm -2 ...?

profjim