tags:

views:

117

answers:

2

Does anybody know of a diff-like tool that can show me the changes between two text files, but ignore changes in whitespace including newlines?

Here's an example:

the quick brown fox jumped over the lazy bear.  the quick brown fox
jumped over the lazy bear.  the quick brown fox jumped over the lazy
bear.  the quick brown fox jumped over the lazy bear.
quick brown fox jumped over the lazy bear.  the quick brown fox jumped
over the lazy bear.  the quick brown fox jumped over the lazy bear.
the quick brown fox jumped over the lazy bear.

All I did was delete one word and reflow it, but "diff -b" detects a change on every line (as it should; I'm not saying this is a bug in diff). But for large LaTeX files this is a major problem; change one word in a long paragraph and the diff you get back is basically useless.

By the way, I'm aware that this requires way more computational power than the usual lines-are-atomic diff. I'm only doing this on small human-generated files and am happy to wait a long time if I have to.

+3  A: 

wdiff does word-by-word alignment.

Craig Peterson
WARNING: wdiff may not be avialable on every system. But it is a cool utility
DVK
Hooray! That is exactly what I wanted. Now I just have to wait for stack overflow to let me declare this the answer.
Adam
A: 

One option is to do this by splitting the entire file into words. Not 100% the same result in terns of knowing the context but very fine-tuned to the type of change you care about.

Example :

cat file1 | perl5.8 -e '{s/\s+/\n/g;}' > file1.split_words
cat file2 | perl5.8 -e '{s/\s+/\n/g;}' > file2.split_words
diff file1.split_words file2.split_words

You can do even better if the text has special properies, to be more specific, the reflow only happens within the bounds of a paragraph which is defined as 2 newlines in a row - simply replace all the single newlines with spaces and run regular diff -w on results.

DVK