Some solution could probably be applicable to Windows, however I am not familiar with the Windows OS, so this will be Linux focused.
As far as I understand, Unix file system all have the concept of inodes, which is where the file system metadata and the "file" is stored. Thus I am wondering if it is possible to use the inode number with some additional information to track files that are renamed or moved around?
What I was proposing to do was have an initial scan that would create a database of filename/path, their disk/drive that it is located on, their inode number, and finally some sort of checksum (sha-1).
This would enable the system to be able to use the inode number to quickly detect if a file got moved or renamed, then it would follow up with the checksum to check to see if its actually the same file.
I can see some possible problems with this scheme:
- Files could be modified then moved/renamed, and this would fail to detect it because the checksum would not match.
- Some (most?) application when they modify the file will create a new temporary file and then switch it with the current one, thus the inode wouldn't match anyway even if the file is in the end unmodified....
- Would need to store which device/file system it is on because inodes on each file system is unique to that file system.
- Would need to deal with hard-links
I am wondering if there is any other gotchas that I am forgetting about here? I was hoping to be able to use the inodes to quickly track down which files got moved or renamed, then follow up with a checksum to confirm that it is actually the same file.