I've done this in my tool MetaMake. Here is the recipe:
- If the index is empty, add the root directory to the index with a timestamp == dir.lastModified()-1.
- Find all directories in the index
- Compare the timestamp of the directory in the index with the one from the filesystem. This is a fast operation since you have the full path (no scanning of all files/dirs in the tree involved).
- If the timestamp has changed, you have a change in this directory. Rescan it and update the index.
- If you encounter missing directories in this step, delete the subtree from the index
- If you encounter an existing directory, ignore it (will be checked in step 2)
- If you encounter a new directory, add it with timestamp == dir.lastModified()-1. Make sure it gets considered in step 2.
This will allow you to notice new and deleted files in an effective manner. Since you scan only for known paths in step #2, this will be very effective. File systems are bad at enumerating all the entries in a directory but they are fast when you know the exact name.
Drawback: You will not notice changed files. So if you edit a file, this will not reflect in a change of the directory. If you need this information, too, you will have to repeat the algorithm above for the file nodes in your index. This time, you can ignore new/deleted files because they have already been updated during the run over the directories.
[EDIT] Zach mentioned that timestamps are not enough. My reply is: There simply is no other way to do this. The notion of "size" is completely undefined for directories and changes from implementation to implementation. There is no API where you can register "I want to be notified of any change being made to something in the file system". There are APIs which work while your application is alive but if it stops or misses an event, then you're out of sync.
If the file system is remote, things get worse because all kinds of network problems can cause you to get out of sync. So while my solution might not be 100% perfect and water tight, it will work for all but the most constructed exceptional case. And it's the only solution which even gets this far.
Now there is a single kind application which would want to preserve the timestamp of a directory after making a modification: A virus or worm. This will clearly break my algorithm but then, it's not meant to protect against a virus infection. If you want to protect against this, you must a completely different approach.
The only other way to achieve what Zach wants is to build a new filesystem which logs this information permanently somewhere, sell it to Microsoft and wait a few years (probably 10 or more) until everyone uses it.