tags:

views:

645

answers:

8

We have a C# Windows service polling a folder waiting for an FTP’ed file to be posted in. To avoid using the file when it is still being written to we attempt to get a lock on the file first, however, there seems to be occasions where we are getting a lock on the file after the FTP’ed file is created but before the file is written to, so we end up opening an empty file.

Is there a reliable anyway to tell if the FTP is complete?

+4  A: 

I'm always a big fan of the .filepart protocol, so that no matter what transfer protocol you use (ftp,ssh,rsync,etc) you have the same understanding.

This isn't a direct answer to your question, but instead of searching for a ftp-only solution a more generic solution could be better for you in long run.

(.filepart: rename the file,test.txt to test.txt.filepart, then when it is done, name it back to test.txt)

cbrulak
+4  A: 

You could possibly change the filename before upload, then rename it after it's done. that way it will look like it doesn't exist until finished.

John Boker
A: 

Rather than polling, you might want to have a look at System.IO.FileSystemWatcher.

Jeremy
This will tell you when a file is created or modified not when its done being written too.
JoshBerke
but combinedd with a bit of ingenuity, you can figure out when the file hasn't been changed for a reasonable ammount of time...
Greg B
Right, he doesn't say that he has control over the sending process. Otherwise obviously you'd do something like a checksum file that you send after the main file and you use that as your trigger. My suggestion doesn't solve his problem completely, its just a suggestion.
Jeremy
I had looked at the System.IO.FileSystemWatcher, however, because it only fires on certain events I can see occasions where, if my service was down for any reason, I could miss files.
Graham Miller
+5  A: 

A practice I've seen done is you transfer two files, One which is the actual file, then a second one which will we can call a .done file. The ideal is as soon as you see the .done file you know the first file should be done.

Other options include watching the file for modifications and wait for a certain ammount of time of no modifications. Of course this is not full proof.

Edit

Kyle makes a good point that adding a checksum to the .done file and/or indicating the size of the first file is a good protection against fringe cases.

JoshBerke
I like the done file for another reason, which is that if you have a transfer that is interrupted, it becomes impossible to tell the difference between that and a completed file. But an interrupted transfer will not send a .done file.
scwagner
Excellent point, a broken transfer will still trigger the polling as being complete. I use the done file for large files, for very small files we never ran into a case of a false positive not to say it couldn't happen...
JoshBerke
A good way to mark broken transfers is by including a header and a footer. If the footer doesn't match the header, the file is not complete.
scottm
Yep that would work well...combine that with a good polling algorithim triggered using FileSystemWatcher and you got a decent solution. A .done file is still better IMHO. You don't have to open the file to check if its done...
JoshBerke
I like this idea, unfortunately I don't have any control other the FTP delivery, but it's something to remember for next time.
Graham Miller
based on the order of delivery the .done file could arrive (or at least be observable) before the original file has finished transferring (or being written to disk) - even if the sender thinks it has sent it afterwards. Including a checksum/size in the .done file assuages this issue.
Kyle Burton
This is definitely the best way to go. We've used it on large applications supporting distributed analysis of data packets with great success.
consultutah
A: 

What about using a folder watcher to index the contents and if a files size does not change within 5 mins you can pretty-much guarantee the upload has been finished.

The time out could be tied to the timeout of your FTP server to.

http://www.codeproject.com/KB/files/MonitorFolderActivity.aspx

Greg B
as @Josh says, it's not perfect
Greg B
As I don't have control over the FTP delivery, I think this is what I will have to do. I agree it's not perfect, but for now it's the accepted answer.
Graham Miller
This technique doesn't account for partial transfers, it _probably_ exceeds the tcp timeout at 5min, but that's not the only concern in these kinds of transfer protocols. Size sometimes isn't even enough since some OSs preallocate the entire file, leaving the untransfered part as nulls.
Kyle Burton
True, but Graham is quite vague so I guess the procedure around validating a file and if the entire size is allocated on disk is an intricacy he will have to deal with. The folder watcher idea was just that. A possible solution to his problem in a given situation
Greg B
+3  A: 

I've always used a checksum file. So you send a checksum file that denotes the filesize and the checksum. You'll know the file is uploaded correctly when the checksum in the first file matches the actual checksum on the file system.

Rob Di Marco
+1  A: 

The method I've used in the past is a mix of some of the other replies here. i.e. FTP a file using a different extension to the one expected (eg FILENAME.part) then rename it with the proper extension as the last step of uploading.

On the server, use a FileSystemWatcher to look for new files with the correct extension. The FSW will not see the file until it's renamed, and the renaming operation is atomic so the file will be complete and available the moment it's been renamed.

Renaming or moving files of course relies on you having control over the uploading process.

If you do not have any control over how the files are uploaded, you will be stuck with using the FSW to know a file is being uploaded, then monitoring it's size - when it's unchanged for a long period of time you may be able to assume it's complete.

Merak
+1  A: 

Everybody, you really, really have to read the 'Tales from the interview' article on The Daily WTF, more precisely, the 2nd story ("I Guess That Would Work, Too")

It gives the answer to this question, you may learn a thing or two about interviews, and you'll probably laugh to tears - I know I did when I read it :-).

Cristi Diaconescu