views:

1773

answers:

17

File systems are volatile. This means that you can't trust the result of one operation to still be valid for the next one, even if it's the next line of code. You can't just say if (some file exists and I have permissions for it) open the file, and you can't say if (some file does not exist) create the file. There is always the possibility that the result of your if condition will change in between the two parts of your code. The operations are distinct: not atomic.

To make matters worse, the nature of the problem means that if you're tempted to make this check, odds are you're already worried or aware that something you don't control is likely to happen to the file. The nature of development environments make this event less likely to happen during your testing and very difficult to reproduce. So not only do you have a bug, but the bug won't show up while testing.

Therefore under normal circumstances the best course of action is to not even try to check if a file or directory exists. Instead, put your development time into handling exceptions from the file system. You have to handle these exceptions anyway, so this is a much better use of your resources. I even have a well-voted answer to this effect in another question.

But I'm having some doubts. In .Net, for example, if that's really always true, the .Exists() methods wouldn't be in the API in the first place. Also consider scenarios where you expect your program to need to the create file. The first example that comes to mind is for a desktop application. This application installs a default user-config file to it's home directory, and the first time each user starts the application it copies this file to that user's application data folder. It expects the file not to exist on that first startup.

So when is it acceptable to check in advance for the existence (or other attributes, like size and permissions) of a file? Is expecting failure rather than success on the first attempt a good enough rule of thumb?

+2  A: 

It depends on your requirements, but one way is to try to obtain an exclusive open file handle, with some sort of retry mechanism. Once you have that handle, it's going to be hard (or impossible) for another process to delete (or move) that file.

I've used code in .NET similiar to the following to obtain an exclusive file handle, where I expect some other process to be possibly writing the file:

FileInfo fi = new FileInfo(fullFilePath);

int attempts = maxAttempts;
do
{
    try
    {
        // Asking to open for reading with exclusive access...
        fs = fi.Open(FileMode.Open, FileAccess.Read, FileShare.None);
    }
    // Ignore any errors... 
    catch {}

    if (fs != null)
    {
        break;
    }
    else
    {
        Thread.Sleep(100);
    }
}
while (--attempts > 0);
Mitch Wheat
while this is true on Windows, it not true on *nix. Root, and in many cases, regular users, can delete/move files at will, regardless of how they're open. Often it doesn't even affect the handle. Windows file locking is so annoying when an app won't let it go... please don't advocate more of it.
rmeador
@rmeador: don't be ridiculous. There are times when an exclusive lock is necessary!
Mitch Wheat
Coming from a Unix background, the only reason file locking like Windows does is ever needed is when you have a system that considers a file name and a file inseparable. Unix doesn't.
derobert
+1  A: 

In *nix environment a well established method for checking if another copy of the program is already running is to create a lock file. So the check for file existence is used to verify this.

Sunny
Wouldn't you normally do open ( ... O_CREAT|O_EXCL ) because you want to obtain the lock if it doesn't exist?
derobert
@derobert: In this case, though, *NIX is checking to see if a program is running. Checking to see if the file exists may very well be implemented by using open with O_CREAT|O_EXCL .
R. Bemrose
Checking to see if a program is running takes more — you actually have to open the pid file, read it, and check if that process still exists. So you'd just do an open without O_CREAT.
derobert
A: 

If you're that concerned about somebody else removing the file, perhaps you should implement some sort of locking system. For instance, I used to work on the code for C-News, a Usenet news server. Since a lot of the things it did could happen asynchronously, it would "lock" a file or a directory by making a temp file, and then hard linking it to a file named "LOCK". If the link failed, it would mean that some other version of the program was writing to that directory, otherwise it was yours and you could do what you like.

The nifty thing about this is that most of the program was written in shell and awk, and this was a very portable locking mechanism. Also, the lock file would contain the PID of the owner, so you could look at the existing lock file to see if the owner was still running.

Paul Tomblin
A: 

We have a diagnostic tool that has to gather a set of files, installer log included. Depending on different conditions the installer log can be in one of two folders. Even worse, there can be different versions of the log in both of these folders. How does the tool find the right one?

It's quite simple if you check for existence. If only one is present, grab that file. If two exist, find which has the latest modification time and grab that file. That's just normal way of doing things.

sharptooth
A: 

While this is a language-agnostic post, it seems you are talking about .NET. Most systems (.NET and others) have more detailed APIs in order to figure out if the file exists when opening the file.

What you should do is make a call to access the file, as it will typically indicate through some sort of error that the file doesn't exist (if it truly doesn't). In .NET, you would have to go through the P/Invoke layer and use the CreateFile API function. If that function returns an error of ERROR_FILE_NOT_FOUND, then you know that the file does not exist. If it returns successfully, then you have a handle that you can use.

The point here is that it is a somewhat atomic operation, which ultimately is what you are looking for.

Then, with the handle, you can pass it to a FileStream constructor and perform your work on the file.

casperOne
Uh, at least now, the question is tagged language-agnostic. So it doesn't make a lot of sense to assume .NET.
unwind
No, he makes a good point. I do come from a .Net background, with .Net making up most (not all) of my file system API experience. However, I do feel the problem extends outside of .Net.
Joel Coehoorn
casperOne
+1  A: 

This may be too simplistic, but I would think the primary reason for checking for the existence of a file (hence the existence of .Exists()) would be to prevent unintended overwrites of existing files, not to avoid exceptions caused by attempting to access non-existent nor non-accessible files.

EDIT 2

This was, in fact, too simplistic and I recommend you see Stephen Martin's response.

cmsjr
Not sure about the .Net API, but surely it has something like Unix's O_EXCL|O_CREAT (create the file, but only if it doesn't exist). Which is atomic, so it can't be fooled by someone creating the file in the middle of the test.
derobert
Possibly, but at least for the System.IO.File class, if you make a call to Create and the file already exists, it will overwrite it without notification or exception (unless another exception is thrown, e.g. file was read only)
cmsjr
@cmsjr: that depends on which method you call and which params you use. @derobert: I'd rather not program by exceptions, and avoid them if I can. If I want to avoid overwritting a config file for example, checking if it exists is a good step. Must better then trying to create it and catching
JoshBerke
No good: a file could be created between when you check and when you go to create it.
Joel Coehoorn
True, but I would argue that files that are volatile enough to be created between .Exists and .Create are the kind of files you can overwrite with less concern than files like SomeIrretrievableAndImportantData.zip.
cmsjr
@Josh, I don't see a parameter to indicate do not overwrite if exists, and I can't imagine how it could be done without wrapping .Exists.
cmsjr
File.Open(@"...", FileMode.CreateNew); Will throw an exception if the file exists. I don't know if it is atomic though.
Samuel
Any of the File.Open or FileStream constructors that take a FileMode parameter can test for the existence of the file and open/create it atomically. To avoid overwriting an existing file use File.Open("path", FileMode.OpenOrCreate).
Stephen Martin
@cmsjr:System.IO.File.Open("path",System.IO.FileMode.CreateNew)
JoshBerke
@Stephen, I am unclear on how it can be atomic if it has to do a read to see if the file exists followed by a possible write if the file does not exist.
cmsjr
It is atomic because the test for existence and the open/create are performed at the file system level and the file system ensures that it is atomic.
Stephen Martin
So Open with an applicable FileMode specified could detect the creation of a file that would be missed by calling Exists immediately followed by Create?
cmsjr
So at this point, I really think Stephen's answer is correct. What's the protocol, should I delete mine?
cmsjr
Strictly speaking it depends on the file system. But in general using FileMode.OpenOrCreate means that the file system checks its index for the file and if it exists it returns a handle to it. If it does not exist it allocates the index record to create the file and returns a handle to it...
Stephen Martin
Locking on the index and handle table keep any other process from intervening. Don't delete your answer the comments may be useful to somebody and the answer isn't terribly wrong j,ust a little naive about file systems.
Stephen Martin
A: 

There are a numbers of possible applications you may well be writing that a simple File.Exists is more than adequate for the job. If it's a config file that only your application will use then you do not need to go so overkill in your exception handling.

Whilst the "flaws" you have pointed out in using this method are all valid, it doesn't mean they are not acceptable flaws for some situations.

Robin Day
+1  A: 

I think the check makes sense when you want to be sure the file was there in the first place. As you said settings files...if there is a file I will try and merge the existing settings instead of blowing them away.

Other cases would be when a user tells me to do something with a file. Yes I know the openFileDialog will check if a file exists (But this is optional). I vaguely remeber back in VB6 this was not the case, so verifying the file existed that they just told me to use was common.

I'd rather not program by exception.

Edit

I didn't miss the point. You might try and access the file, an exception is thrown and then when you go to create the file, the file was already placed there. Which now causes your exception handling code to go on the fritz. So I guess we could then have an exception handler in our exception handler to catch that the file changed yet again...

I'd rather try and prevent exceptions, not use them to control logic.

Edit

Additionally another time to check for attributes such as size is when your waiting for a file operation to finish, yes you never know for sure but with a good algorithim and depending on the system writting the file you might be able to handle a good deal of cases (Had a system running for five years which watched for small files coming over ftp, and it uses a the same api as the file system watcher, and then starts polling waiting for the file to stop changing, before raising an event that the file is ready to be consumed).

JoshBerke
+1 - I'm going to delete my answer in favor of yours, which I think captures the the essence of the question :)
Doug L.
You missed the point: if you want to be sure the file is there, the operations to check the file and then use it are not atomic. Something can happen in the interval.
Joel Coehoorn
+1  A: 

I'd only check it if I expect it to be missing (e.g. the application settings) and only if I have to read the file.

If I have to write to the file, it's either a logfile (so I can just append to it or create a new one) or I replace the contents of it, so I might as well recreate it anyway.

If I expect that the file exists, it would be right that an Exception is thrown. Exception handling should then inform the user or perform recovery. My opinion is that this results in cleaner code.

File protection (i.e. not overwriting (possibly important) files) is different, in that case I'd always check whether a file exists, if the framework doesn't do that for me (think SaveFileDialog)

Lennaert
A: 

I think anytime that you know that the file may or may not exist and you want to perform some alternate action based on the existence of the file, you should do the check because in this case it's not an exceptional condition for the file to not exist. This won't absolve you from having to handle exceptions -- from someone else either removing or creating the file between the check and your open -- but it makes the intent of the program clear and doesn't rely on exception handling to perform flow-control logic.

EDIT: An example might be log rotation on start up.

  try
  {
       if (File.Exists("app.log"))
       {
           RotateLogs();
       }

       log = File.Open("app.log", FileMode.CreateNew );
  }
  catch (IOException)
  {
     ...another writer, perhaps?
  }
  catch (UnauthorizedAccessException)
  {
     ...maybe I should have used runas?
  }
tvanfosson
That doubles the amount of code and adds in a seldom-used error path. Sounds quite likely to give rise to bugs, to me.
derobert
No. Just have the try/catch be the outer block, catching different exception types as appropriate, then do your conditional logic inside the try block. How is this doubling code -- you already want to do something different based on existence in my scenario?
tvanfosson
@tvanfosson: Your code example clarifies greatly, thank you.
derobert
A: 

One example: You may be able to check for existence of files which you are unable to open (due to, for example, permissions).

Another, possibly better example: You want to check for the existence of a Unix device file. But definitely do not open it; opening it has side effects (e.g., open/close /dev/st0 will rewind the tape)

derobert
A: 

A variety of apps include built-in web servers. It's common for them to generate self-signed SSL certificates the first time they start up. A straightforward way to implement this would be to check whether the cert exists on startup, and create it if not.

In theory, it could exist for the check, and not exist later. In that case, we'd get an error when we try to listen, but that can be handled quite easily and is not a big deal.

It's also possible that it doesn't exist for the check, and exists later. In that case, it either gets overwritten with a new cert, or writing the new cert fails, depending on your policy. The first is a little annoying, in terms of the cert change causing some alarm, but also not really critical, especially if you do a bit of logging to indicate what is going on.

And, in practice, both cases are extraordinarily unlikely to ever come up.

DNS
derobert
Because it's library code that actually reads the file.
DNS
A: 

Like you pointed out its always important what the program should do if the file is missing. In all my applications the user can always delete the config file and the application will create a new one with default values. No Problem. I also ship my applications without config files.

But users tend to delete files and even files they should not delete like serial keys and template files. I always check for these files because without them the application is unable to run at all. I can not create a new serial key from default.

Whats should happen when the file is missing? You can do a file find or exception handler but the real question is : What will happen when the file is missing? Or how important is the file for the application. I check all the time before I try to access any support files for the app. Additional I do error handling if the file is corrupt and can not be loaded.

Holli
A: 

I think the reason for "Exists" is to determine when files are missing without the need for creating all the OS housekeeping data required to access the file or having exceptions being thrown. So it's a file handling optimisation more than anything else.

For a single file, the saving the "Exists" gives is generally insignificant. If you were checking if a file exists many, many times (for example, searching for #include files) then the saving could be significant.

In .Net, the specification for File.Exists doesn't list any exceptions that the method might throw, unlike for example File.Open which lists nine exceptions, so there's certainly less checking going on in the former.

Even if "Exists" returns true, you still need to handle exceptions when opening the file, as the .Net reference suggests.

Skizz

Skizz
+26  A: 

The File.Exists method exists primarily for testing for the existence of a file when you do not intend to open the file. For example testing for the existence of a locking file whose very existence tells you something but whose contents are immaterial.

If you are going to open the file then you will need to handle any exception regardless of the results of any prior calls to File.Exists. So, in general, there is no real value in calling it in these circumstances. Just use the appropriate FileMode enumeration value in your open method and handle any exceptions, as simple as that.

EDIT: Even though this is couched in terms of the .Net API, it is based on the underlying system API. Both Windows and Unix have system calls (i.e. CreateFile) that use the equivalent of the FileMode enumeration. In fact in .Net (or Mono) the FileMode value is just passed through to the underlying system call.

Stephen Martin
Well, you've convinced me +1
cmsjr
A: 

To answer my own question (in part), I want to expand on the example I used: a default config file.

Rather than check if it exists at app startup and try to copy the file if the check fails, the thing to do is always try to copy the file. You just do it in such a way that the copy will fail if the file exists rather than replace an existing file. This way all you need to do is catch and ignore any exception thrown if the copy fails because of an existing file.

Joel Coehoorn
A: 

Your problem could easily be solved with basic computer science... read up on Semaphores.

(I did not mean to sound like a jerk, I was just pointing you to a simple answer for a common problem).

Mike Curry