I have a database where each entry has a file path and a last modified field:
1284581625555 C:\docs\text1.txt
1284581646992 C:\docs\text2.txt
1284581654886 C:\docs\text3.txt
1284581662927 C:\docs\subfolder\text4.txt
1284581671986 C:\docs\subfolder\text5.txt
...
Each entry also has a summary of the file contents, and the entries were created by recursively walking down a certain folder (in this case C:\docs) and adding all visited files. Now I'd like to update the database, i.e.
- Add newly created files
- Remove deleted files
- Update modified files
Obviously, I have to walk down the root folder again to see what has changed. But what is the most efficient way to do so?
There are two approaches I can think of:
- First traverse the database, remove all deleted entries and update all modified entries. For this, each time you have to create a file object from the the stored path string, and call file.exists() or file.isModified(). Then recursively walk down the root folder and add files which aren't in the database yet.
- First walk down the file tree and remember in a list what has been added/deleted/modified --- this requires having stored a complete snapshot of the previous state of the file tree. Then traverse the database and add/delete/modify entries, based on the previously created list.
Which approach is better? Are there any other?
EDIT: Creating the summary is very expensive (full text extraction), and traversing the database is also somewhat expensive, since it is file-based.