views:

638

answers:

10

We have several .NET applications that monitor a directory for new files, using FileSystemWatcher. The files are copied from another location, uploaded via FTP, etc. When they come in, the files are processed in one way or another. However, one problem that I have never seen a satisfactory answer for is: for large files, how does one know when the files being monitored are still being written to? Obviously, we need to wait until the files are complete and closed before we begin processing them. The event args in the FileSystemWatcher events do not seem to address this.

+1  A: 

Have you tried getting a write lock on the file? If it's being written to, that should fail, and you know to leave it alone for a bit...

Chris Marasti-Georg
+1  A: 

You probably have to go with some out of band signaling: have the producer of "file.ext" write a dummy "file.ext.end".

kokos
+3  A: 

The "Changed" event on the FileSystemWatcher should shouldn't fire until the file is closed. See my answer to a similar question. There is a possibility that the FTP download mechanism closes the file multiple times during download as new data comes in, but I would think that is a little unlikely.

Kibbee
+3  A: 

If you are in control on the program that is writing the files into the directory, you can have the program write the files to a temporary directory and then move them into the watched directory. The move should be an atomic operation, so the watcher shouldn't see the file until it is fully in the directory.

If you are not in control of what is writing to the watched directory, you can set a time in the watcher where the file is considered complete when it has remained the same size for the given time. If immediate processing isn't a concern, setting this timer to something relatively large is a fairly safe way to know that either the file is complete or it never will be.

Ryan Ahearn
+2  A: 

Unless the contents of a file can be verified for completion (it has a verifiable format or includes a checksum of the contents) only the sender can verify that a whole file has arrived.

I have used a locking method for sending large files via FTP in the past.

File is sent with an alternative extension and is renamed once the sender is happy it is all there.

The above is obviously combined with a process which periodically tidies up old files with the temporary extension.

An alternative is to create a zero length file with the same name but with an additonal .lck extension. Once the real file is fully uploaded the lck file is deleted. The receiving process obviously ignores files which have the name of a lock file.

Without a system like this the receiver can never be sure that the whole file has arrived.

Checking for files that haven't been changed in x minutes is prone to all sorts of problems.

Matt Lacey
A: 

Thanks for all the quick answers; the stackoverflow community rocks!

Generally we don't have control over what is being written to the watched directory. I agree that when you do have such control, writing a separate, small lock file of some kind, or moving/renaming the file once it's completed writing, certainly works (and I have used these techniques myself). At the moment, our biggest need is watching an incoming FTP directory, where large files are being uploaded. I like the technique of attempting to get a write lock on the file; I'll pass that along to our other developers.

glaxaco
A: 

+1 for using a file.ext.end signaler if possible, where the contents of file.ext.end is a checksum for the larger file. This isn't for security so much as it is to make sure nothing got garbled along the way. If someone can insert their own file into the large stream they can replace the checksum as well.

Joel Coehoorn
A: 

A write lock doesn't help if the file upload failed part way through and the sender hasn't tried resending (and relocking) the file yet.

Matt Lacey
A: 

The way I check in Windows if a file has been completely uploaded by ftp is to try to rename it. If renaming fails, the file isn't complete. Not very elegant, I admit, but it works.

Kaniu
A: 

The following method tries to open a file with write permissions. It will block execution until a file is completely written to disk:

/// <summary>
/// Waits until a file can be opened with write permission
/// </summary>
public static void WaitReady(string fileName)
{
    while (true)
    {
        try
        {
            using (System.IO.Stream stream = System.IO.File.Open(fileName, FileMode.Open, FileAccess.ReadWrite, FileShare.ReadWrite))
            {
                if (stream != null)
                {
                    System.Diagnostics.Trace.WriteLine(string.Format("Output file {0} ready.", fileName));
                    break;
                }
            }
        }
        catch (FileNotFoundException ex)
        {
            System.Diagnostics.Trace.WriteLine(string.Format("Output file {0} not yet ready ({1})", fileName, ex.Message));
        }
        catch (IOException ex)
        {
            System.Diagnostics.Trace.WriteLine(string.Format("Output file {0} not yet ready ({1})", fileName, ex.Message));
        }
        catch (UnauthorizedAccessException ex)
        {
            System.Diagnostics.Trace.WriteLine(string.Format("Output file {0} not yet ready ({1})", fileName, ex.Message));
        }
        Thread.Sleep(500);
    }
}

(from my answer to a related question)

0xA3