I'm looking for existing ideas / solutions to the problem of finding differences between two directories. Specifically how to identify files that might have been changed, renamed and moved.
A short list of things I've considered:
- try to pair up files missing in dir A with new files in dir b by using some heuristic such as 75% match in content. This just doesn't seem robust enough (problem cases include: significant changes in content, compression or encryption, possible multiple matches)
- use alternative data streams to add an id to each file. This would work only on NTFS.
- add a header/footer to each file containing and id. There's no way to guarantee header/footer will not corrupt the file.
- ask for user input for each change to determine if file is indeed deleted or simply moved. This is too hard on user.
- require user to rename/move files only by using special commands which will keep track of such changes. This is too hard on user.
- setting up a file system watcher to catch changes on the fly. Several issues (watcher must run at all times, is platform specific...)
Any ideas welcome...