I have two versions of a very large and complicated directory structure with tens of thousands of individual files and I want to look for significant file changes from one version to another.
Each and every file has changed in some minor way. For example you might have a file called intro.txt which would contain
[Build 1057 done by Mike 12:00] - (version 1)
[Build 1065 done by Mike 18:10] - (version 2)
I don't care about changes like that since they contain no useful information. I also don't care about corrections to spelling mistakes or the addition of a word or two.
What I really want to do is pull out which files have changed in a more major way. One way they might have changed is for a lot of extra content to have been added which would increase the filesize - that's the kind of change I am interested in.
So, how would you recursively parse through the directories looking for files that have increased (or decreased) by a set amount from one version to the next.
I'm running linux but pretty much any language will do.