views:

618

answers:

12

My application traverses a directory tree and in each directory it tries to open a file with a particular name (using File.OpenRead()). If this call throws FileNotFoundException then it knows that the file does not exist. Would I rather have a File.Exists() call before that to check if file exists? Would this be more efficient?

A: 

Yes, exceptions are expensive

bigtang
Context, man, context!
Michael Petrotta
Compared to the file I/O, the exception isn't expensive at all.
Harry Steinhilber
+19  A: 

Update

I ran these two methods in a loop and timed each:

void throwException()
{
    try
    {
        throw new NotImplementedException();
    }
    catch
    {
    }
}

void fileOpen()
{
    string filename = string.Format("does_not_exist_{0}.txt", random.Next());
    try
    {
        File.Open(filename, FileMode.Open);
    }
    catch
    {
    }
}

void fileExists()
{
    string filename = string.Format("does_not_exist_{0}.txt", random.Next());
    File.Exists(filename);
}

Random random = new Random();

These are the results without the debugger attached and running a release build :

Method          Iterations per second
throwException                  10100
fileOpen                         2200
fileExists                      11300

The cost of a throwing an exception is a lot higher than I was expecting, and calling FileOpen on a file that doesn't exist seems much slower than checking the existence of a file that doesn't exist.

In the case where the file will often not be present it appears to be faster to check if the file exists. I would imagine that in the opposite case - when the file is usually present you will find it is faster to catch the exception. If performance is critical to your application I suggest that you benchmark both apporaches on realistic data.

As mentioned in other answers, remember that even in you check for existence of the file before opening it you should be careful of the race condition if someone deletes the file after your existence check but just before you open it. You still need to handle the exception.

Mark Byers
You answer totally disregards the fact that the file to be opened might be missing in the majority of directories. In such a case you recommendation would be very expensive actually.
0xA3
But before the framework knows it needs to throw an exception, it has to do some work. Are you saying that that work is NOT I/O work?
Russell McClure
Can you perhaps provide more information about this? In the case of the exception not being hit (and the file existing) it's definitely faster, but presumably if the file exists he is reading it, which is not much slower than stat-ing the file, right? (unless the file is an empty file and its existence is all that matters). If the exception is thrown and the stack unwinds, isn't it slower? (stat + stack unwind v. stat + jump). If you know any articles, books, or links about exceptions and I/O please provide them. Thank you.
Jared Updike
@Marc: Good point. Actually a theoreticaly non-throwing `File.TryOpenRead` would be the best solution in terms of both performance and clean design, but unfortunately the BCL team didn't provide one.
Ben Voigt
@Ben Voigt: Such a method is not provided because it doesn't make sense. You would have use try/catch anyway as IO can cause exogeneous exceptions. What for instance if someone unplugs the external drive in the middle of a read operation? Read Eric's post (see Jared's answer).
0xA3
@0xA3: If setting up and catching a Try block costs 5, then File.Exists costs 500, and File.Open costs 510 (I'm being rather conservative here, in reality disk I/O is tremendously much slower than this would imply). Even when the majority of files are missing, the cost of setting up and catching the try blocks are negligible compared to the cost of stat-ing the files.
Lie Ryan
@Mark: awesome to put it to the test. What ratio of existing v. missing files did you try? Would the results be different with different percent missing?
Jared Updike
So from that data we can conclude that the File.Exists check is faster if more than about 1% of the files are not found.
CodeInChaos
@Lie Ryan: Did you actually *measure* these timings or is it just something that you *think* is the case?
0xA3
@Jared Updike: 100% missing (but all form the same directory). It might be interesting to see if File.Exists becomes much slower if different directories are tested.
Mark Byers
But all your file checks are to the same directory, so caching might play a huge role here. But since there is the same caching between File.Exits and File.Open in case of success, it shouldn't change the conclusion.
CodeInChaos
This comparison doesn't even make sense. You have to compare File.Open throwing an exception versus File.Exists followed by a File.Open. You are currently timing only creating a new exception all on your own. Whats the point of timing that in relation to the OP's question?
Russell McClure
I did not know about C# and .NET, but in Python under Linux, when I measure exception vs exists using timeit, I've got: byException 22.271638155sec byExists 22.3116679192 sec. There is virtually no measurable performance difference using exception and checking file exists. Testing code in my answer.
Lie Ryan
@ Russell McClure: The OP did specifically ask for the performance of exceptions and I think the general performance of exceptions is interesting both in the context of this question and in a wider context so I will leave this in my answer. But your point about the comparison is a valid one, and so I have added another test which uses File.Open.
Mark Byers
@Mark Byers: Did your conclusion change? After debugging is disabled, C# exceptions appears pretty speedy.
Lie Ryan
@0xA3: Clearly `TryOpenRead` wouldn't protect against failure in a later read or write, only against open failure. You'd need `TryRead`, etc. Moreover, it makes a lot more sense to provide `TryOpen`-type functions because inability to open a file is not an exceptional condition. `TryOpen` not only deals with non-atomicity of `Exists` then `Open`, it handles permissions issues, sharing violations, etc. Perhaps most importantly, it's FREE. The Win32 API reports `CreateFile` failures with a return value, not an exception.
Ben Voigt
@Lie Ryan: A thrown exception seems to be comparable in speed to a call to File.Exists, but File.Open seems slower than both. My conclusion now is that in general File.Exists is faster than the File.Open when the file is not found (though it is no longer two orders of magnitude - just a factor of five). I don't really understand *why* it's so much faster though. The throwing of an exception does not seem to be enough to explain the performance difference between File.Exists and File.Open.
Mark Byers
+1  A: 

I would say that, generally speaking, exceptions "increase" the overall "performance" of your system!

In your sample, anyway, it is better to use File.Exists...

Lorenzo
I think I agree, although if I do that I risk to get in to a race condition, but for my purposes it is alright. why do you think it is better to use File.Exists?
akonsu
The race condition only happens if these files are being deleted and recreated rapidly, right?
Jared Updike
@Jared Updike: Exactly what I was writing :)
Lorenzo
@akonsu: Simply because, as somebody else already pointed out, checking if a file exist is not an exceptional situation...
Lorenzo
@Jared Updike: The race condition doesn't have to do with *rapid* creation or deletion. *Concurrency* is the problem, i.e. a single concurrent IO operation by another process or some user interaction such as disconnecting a device are sufficient to trigger the race condition.
0xA3
@Lorenzo: I disagree. If you get your filename from a GUI File Selector, then a missing file is an exceptional circumstance that should almost never happen unless something smirky is happening.
Lie Ryan
+7  A: 

Is this behavior truly exceptional? If it is expected, you should be testing with an if statement, and not using exceptions at all. Performance isn't the only issue with this solution and from the sound of what you are trying to do, performance should not be an issue. Therefore, style and a good approach should be the items of concern with this solution.

So, to summarize, since you expect some tests to fail, do use the File.Exists to check instead of catching exceptions after the fact. You should still catch other exceptions that can occur, of course.

Michael Goldshteyn
An IO exception is what Eric Lippert calls an [exogenous exception](http://blogs.msdn.com/b/ericlippert/archive/2008/09/10/vexing-exceptions.aspx). It definitely should be handled. But I agree with you that the check should be introduced depending on whether non-existance of the file is something that is expected with regular program execution.
0xA3
He needs to catch and handle the exceptions anyways. For one because Exists creates a race condition(and thus makes the code harder to understand) and there are other ways for File.OpenRead to fail.
CodeInChaos
Very good answer, except for thinking that `File.Exists` is a good substitute for `File.TryOpenRead`. It's not the same, it doesn't solve the problem, so don't use it.
Ben Voigt
@0xA3: Nice link. I find myself a quite peeved that there isn't a useful exception hierarchy to distinguish CPU-on-fire exceptions from others, and that there aren't more "try" methods available for things like FileOpen that are apt to fail. I don't like using a blanket "catch" for stuff that can go wrong opening a file, but what realistic alternative is there? In vb.net it's possible with a little kludge to catch all but a few types of exceptions (in C# the closest one can come I think is catch the dangerous ones and rethrow). Any thoughts?
supercat
@supercat: As I already mentioned in another comment, a Try* method for IO related stuff would not make sense at all. With `File.OpenRead` you would basically have to take care of the exceptions mentioned in http://msdn.microsoft.com/en-us/library/system.io.file.openread.aspx. What you ask for seems to be something like checked exceptions which is not supported by C# and a topic that is more than controversial.
0xA3
@0xA3: So should every code that wants to open a file have an explicit catch for those seven exceptions? Wouldn't it make more sense to have a TryOpenRead which will return false and (possibly returning a non-thrown exception as a Ref parameter) if one of those normal exceptions happens, while throwing any evil exceptions that occur?
supercat
+10  A: 

No, don't. If you use File.Exists, you introduce concurrency problem. If you wrote this code:

if file exists then 
    open file

then if another program deleted your file between when you checked File.Exists and before you actually open the file, then the program will still throw exception.

Second, even if a file exists, that does not mean you can actually open the file, you might not have the permission to open the file, or the file might be a read-only filesystem so you can't open in write mode, etc.

File I/O is much, much more expensive than exception, there is no need to worry about the performance of exceptions.

EDIT: Benchmarking Exception vs Exists in Python under Linux

import timeit
setup = 'import random, os'

s = '''
try:
    open('does not exist_%s.txt' % random.randint(0, 10000)).read()
except Exception:
    pass
'''
byException = timeit.Timer(stmt=s, setup=setup).timeit(1000000)

s = '''
fn = 'does not exists_%s.txt' % random.randint(0, 10000)
if os.path.exists(fn):
    open(fn).read()
'''
byExists = timeit.Timer(stmt=s, setup=setup).timeit(1000000)

print 'byException: ', byException   # byException:  23.2779269218
print 'byExists: ', byExists  # byExists:  22.4937438965
Lie Ryan
What do you think the framework does before it knows it needs to throw the exception? You are acting like it can magically run some noop and know that the file is not there.
Russell McClure
I would guess that the framework just tries to open the file using the underlying OS's call (probably `CreateFile` on windows, and if it gets back a handle it knows that it succeeded, else it knows that it failed. No race condition.
dsolimano
You're spot on with the concurrency issue but the benchmark might be misleading. Exceptions in Python are virtually free - every function call already has the overhead of passing an exception. The CLR might have different behavior.
Sean McSomething
@Russel McClure: Unfortunately yes, the OS do have some magic that we normal programmer's can't easily do. Specifically, the filesystem driver can atomically check file existence and open at the same time. In multi-tasking OS, user programs can't easily guarantee that file.exists followed by file.open will always succeed.
Lie Ryan
@Sean McSomething: you may be right.
Lie Ryan
@Lie Ryan: Take a look at my answer for the timings. Calling File.Exists is the right choice from both a performance perspective and a C# style perspective. As for the file dissappearing between the existance check and the open, then that would be an exceptional case which of course a good programmer would have to catch. But that is truly an exceptional case. From my timings it should be clear that you want to check for existance FIRST.
Russell McClure
+2  A: 

I don't know about efficiency but I would prefer the File.Exists check. The problem is all the other things that could happen: bad file handle, etc. If your program logic knows that sometimes the file doesn't exist and you want to have a different behavior for existing vs. non-existing files, use File.Exists. If its lack of existence is the same as other file-related exceptions, just use exception handling.

Jared Updike
Very good answer, except for thinking that `File.Exists` is a good substitute for `File.TryOpenRead`. It's not the same, it doesn't solve the problem, so don't use it.
Ben Voigt
+1  A: 

The problem with using File.Exists first is that it opens the file too. So you end up opening the file twice. I haven't measured it, but I guess this additional opening of the file is more expensive than the occasional exceptions.

If the File.Exists check improves the performance depends on the probability of the file existing. If it likely exists then don't use File.Exists, if it usually doesn't exist the the additional check will improve the performance.

CodeInChaos
well, these exceptions are not occasional because i am searching for files and they exist only in a limited number of directories, so i get tons of exceptions...
akonsu
So more checks fail than succeed? In that case you might try and measure if exists is faster. But you should document that it's only a performance optimization and that your code works even if a race condition occurs.
CodeInChaos
No, File.Exists doesn't open the file. There's none of the overhead of creating an OS object to represent an open file handle -- it just has to read the directory entry. And you won't even incur that cost twice, because after the File.Exists check, the directory entry will be in the OS'es in-memory cache.
Joe White
+3  A: 

Yes, you should use File.Exists. Exceptions should be used for exceptional situations not to control the normal flow of your program. In your case, a file not being there is not an exceptional occurrence. Therefore, you should not rely on exceptions.

UPDATE:

So everyone can try it for themselves, I'll post my test code. For non existing files, relying on File.Open to throw an exception for you is about 50 times worse than checking with File.Exists.

class Program
{
   static void Main(string[] args)
   {
      TimeSpan ts1 = TimeIt(OpenExistingFileWithCheck);

      TimeSpan ts2 = TimeIt(OpenExistingFileWithoutCheck);

      TimeSpan ts3 = TimeIt(OpenNonExistingFileWithCheck);

      TimeSpan ts4 = TimeIt(OpenNonExistingFileWithoutCheck);
   }

   private static TimeSpan TimeIt(Action action)
   {
      int loopSize = 10000;

      DateTime startTime = DateTime.Now;
      for (int i = 0; i < loopSize; i++)
      {
         action();
      }

      return DateTime.Now.Subtract(startTime);
   }

   private static void OpenExistingFileWithCheck()
   {
      string file = @"C:\temp\existingfile.txt";
      if (File.Exists(file))
      {
         using (FileStream fs = File.Open(file, FileMode.Open, FileAccess.Read))
         {
         }
      }
   }

   private static void OpenExistingFileWithoutCheck()
   {
      string file = @"C:\temp\existingfile.txt";
      using (FileStream fs = File.Open(file, FileMode.Open, FileAccess.Read))
      {
      }
   }

   private static void OpenNonExistingFileWithCheck()
   {
      string file = @"C:\temp\nonexistantfile.txt";
      if (File.Exists(file))
      {
         using (FileStream fs = File.Open(file, FileMode.Open, FileAccess.Read))
         {
         }
      }
   }

   private static void OpenNonExistingFileWithoutCheck()
   {
      try
      {
         string file = @"C:\temp\nonexistantfile.txt";
         using (FileStream fs = File.Open(file, FileMode.Open, FileAccess.Read))
         {
         }
      }
      catch (Exception ex)
      {
      }
   }
}

On my computer:

  1. ts1 = .75 seconds (same with or without debugger attached)
  2. ts2 = .56 seconds (same with or without debugger attached)
  3. ts3 = .14 seconds (same with or without debugger attached)
  4. ts4 = 14.28 seconds (with debugger attached)
  5. ts4 = 1.07 (without debugger attached)

UPDATE:

I added details on whether a dubgger was attached or not. I tested debug and release build but the only thing that made a difference was the one function that ended up throwing exceptions while the debugger was attached (which makes sense). Still though, checking with File.Exists is the best choice.

Russell McClure
You've explained why `File.TryOpenRead` should be used instead of `File.OpenRead`. Unfortunately `File.TryOpenRead` doesn't actually exist. And `File.Exists` is not at all equivalent to `File.TryOpenRead`.
Ben Voigt
That confused me... couldn't get any hits on Google for TryOpenRead. I suppose if it existed... it would be helpful?
Jared Updike
Yeah, I usually try and stick with functions that exist.....
Russell McClure
Is this benchmark with debugging enabled or disabled? As the other answers have shown, debugging makes a huge difference in C#.
Lie Ryan
@Lie Ryan: Good point. It's actually not whether you have a debug build or a release build but whether a debugger is attached or not. I'll update my post with numbers without a debugger attached.
Russell McClure
Extrapolating from this benchmark, then for the particular C# and .NET version, the cutoff point is about 83% (0.56 * n + 1.07 * (1 - n) = 0.75 * n + 0.14 * (1 - n); n = 0.83). If more than 83% file exists than try-catch will be faster than exists. However, using File.Exists does not solve the race condition, so if performance matters, you have much less than 80% files that does not exists, and you're not too worried about race condition, then use file.exists; otherwise if program correctness and reliability matters or if you're expecting more than 80% file exists, then use try-except.
Lie Ryan
+5  A: 

It depends !

If there's a high chance for the file to be there (you know this for your scenario, but as an example something like desktop.ini) I would rather prefer to directly try to open it. Anyway, in case of using File.Exist you need to put File.OpenRead in try/catch for concurrency reasons and avoiding any run-time exception but it would considerably boost your application performance if the chance for file to be there is low. Ostrich algorithm

Xaqron
+3  A: 

File.Exists is a good first line of defense. If the file doesn't exist, then you're guaranteed to get an exception if you try to open it. The existence check is cheaper than the cost of throwing and catching an exception. (Maybe not much cheaper, but a bit.)

There's another consideration, too: debugging. When you're running in the debugger, the cost of throwing and catching an exception is higher, because the IDE has hooks into the exception mechanism that increase your overhead. And if you've checked any of the "Break on thrown" checkboxes in Debug > Exceptions, then any avoidable exceptions become a huge pain point. For that reason alone, I would argue for preventing exceptions when possible.

However, you still need the try-catch, for the reasons pointed out by other answers here. The File.Exists call is merely an optimization; it doesn't save you from needing to catch exceptions due to timing, permissions, solar flares, etc.

Joe White
Good point. The debugging cost should probably be the most relevant consideration.
CodeInChaos
+5  A: 

Wouldn't it be most efficient to run a directory search, find it, and then try to open it?

Dim Files() as string = System.IO.Directory.GetFiles("C:\", "SpecificName.txt", IO.SearchOption.AllDirectories)

Then you would get an array of strings that you know exist.

Oh, and as an answer to the original question, I would say that yes, try/catch would introduce more processor cycles, I would also assume that IO peeks actually take longer than the overhead of the processor cycles.

Running the Exists first, then the open second, is 2 IO functions against 1 of just trying to open it. So really, I'd say the overall performance is going to be a judgment call on processor time vs. hard drive speed on the PC it will be running on. If you've got a slower processor, I'd go with the check, if you've got a fast processor, I might go with the try/catch on this one.

Jrud
+1. The exceptions are irrelevant - if you are traversing the tree then *you already know whether the file exists* (except in the race scenario, but that's much less likely now you know roughly where the file(s) are). Using the API designed for *this exact use case* is just the next logical step ;)
SimonJ
I would get files from all the directories, then check the list for file existence, when I'm in such directory. It would be fast for files that aren't deleted in meanwhile, and would revert to 'slow' speed if there are exceptions to be thrown.
Daniel Mošmondor
A: 

The overhead of an exception is noticeable, but it's not significant compared to file operations.

Yuliy