views:

51

answers:

5

I have a utility which goes through a processes a set of files in a directory - the process is relatively slow (and there are a lot of files) and so I've tried to optimise the process by only processes files that have a "last modified" later than the last processing date.

Usually this works well however I've found that as copying a file doesn't change the last modified date, and so there are various scenarios involving copying files in which certain files that have changed are skipped by the process, for example:

  1. The user processes the directory at 9:00.
  2. A file is then copied from this directory and modified so that it has a last modified date of 9:30
  3. The directory is then processed again at 10:00
  4. The modified file is then copied back into the directory at 10:30
  5. Finally the directory is processed again at 11:00

As the modified date of the given file is 9:30, and the directory was last processed at 10:00 the file is skipped when it shouldn't be.

Unfortunately the above tends to happen far too often in certain situations (such as in a collaborative environment with source control etc...). Clearly my logic is flawed - what I really need is a "last modified or copied" date. does such a thing exist?

Failing that, is there another way to quickly determine with reasonable reliability if a given file has changed?

+2  A: 

Have you thought of running MD5 checksums on the files and storing them later for comparison? If your always processing a certain directory, this might be feasible.

WanderingThoughts
Yes, however this is my fallback solution - I'm hoping that there is a way to determine if a given file has changed (with reasonable reliability) without needing to read the file itself.
Kragen
+3  A: 

You might want to look at using the FileSystemWatcher class. This class lets you monitor a directory for changes and will fire an event when something is modified. Your code can then handle the event and process the file.

From MSDN:

// Create a new FileSystemWatcher and set its properties.
FileSystemWatcher watcher = new FileSystemWatcher();
watcher.Path = args[1];
/* Watch for changes in LastAccess and LastWrite times, and
   the renaming of files or directories. */
watcher.NotifyFilter = NotifyFilters.LastAccess | NotifyFilters.LastWrite
   | NotifyFilters.FileName | NotifyFilters.DirectoryName;
// Only watch text files.
watcher.Filter = "*.txt";

// Add event handlers.
watcher.Changed += new FileSystemEventHandler(OnChanged);
watcher.Created += new FileSystemEventHandler(OnChanged);
watcher.Deleted += new FileSystemEventHandler(OnChanged);
watcher.Renamed += new RenamedEventHandler(OnRenamed);
Matthew Manela
The trouble is that the process is a batch process, and so won't be running all of the time. In fact if the folder in question is a mapped network drive then the *computer* might not even be running.
Kragen
A: 

Have you considered adding a process to watch your directory instead? Using a FileSystemWatcher? Then you move from using a batch process and a real time system for monitoring your files.

Makach
+2  A: 

You can use the FileInfo class to get the required change information (which you might be already using). You need to check two properties of a file, which are LastWriteTime and CreationTime. If either of them is higher than your last processing date, you need to copy the file. It is a common misconception that CreationTime is always less than LastWriteTime. It's not. If a file is copied to another file, the new file retains the LastWriteTime of the source but the CreationTime will be the time of the copy.

Yogesh
Ah - so close! This nearly works, but unfortunately if you are overwriting a file then it looks like the creation time is left as the time that the *original* file was created.
Kragen
What method are you using to overwrite a file? Because if a file is overwritten, its CreationTime will be same as before but its `LastWriteTime` will change. One of the properties HAS to change.
Yogesh
If using "copy" from cmd.exe, the destination file `LastWriteTime` is set to the source file LWT, not updated to current time.
snemarch
Read the answer again. I said that already..."If a file is copied to another file, the new file retains the LastWriteTime of the source but the CreationTime will be the time of the copy." Hardly matters what you used to copy, explorer or command prompt or System.IO .FileCopy method.
Yogesh
A: 

As you've observed, copying a file to an existing destination file keeps the existing file's CreationTime, and sets LastWriteTime to the source file's LastWriteTime, rather than current system time when doing the copy. Two possible solutions:

  1. Do a delete-and-copy, ensuring a destination CreationTime will be system's current time.
  2. Check for file's Archived attribute as well, and clear it while processing. When copying source->dest, dest +A attribute will be set.
snemarch