views:

169

answers:

2

In another question, we established that yes, CreateDirectory occasionally fails with the undocumented GetLastError value of ERROR_ACCESS_DENIED, and that the right way to handle the situation is probably to try again a few times. It's easy to implement such an algorithm, but it's not so easy to test it when you don't know how to reproduce it.

I don't need theories for why this happens. It might be a bug in Windows, yeah. It might also be by design. Ultimately, at this point, it doesn't matter, because Microsoft shipped the behavior, and I must cope.

I also don't need an explanation of multi-tasking operating system theory and how Windows implements it in general. I write system software for a living. I understand little else.

What I need now is a reliable way to reproduce the failure so I can write a test case for the code which copes. Here's what I've tried so far:

  • I wrote test program P1, which slowly and repeatedly enumerates the contents of the would-be parent. As well, I wrote test program P2 which does nothing but repeatedly attempt to delete and create a directory in the would-be parent. I figured keeping an enumeration open a long time might make the problem more likely. Running P2 by itself produces an occasional period of failures (on the order of every several minutes for approximately 10 milliseconds). Running P1 and P2 at the same time does not seem to make the failures any more frequent or long.

  • I ran two instances of P2 at the same time, and that does not seem to make the failures any more frequent or long.

  • I modified P2 so that it can create files in addition to directories, and running that at the same time as P1 does not seem to make the failures any more frequent or long.

  • I ran P1 and multiple instances of P2 with different parameters all at the same time, and that does not seem to make the failures any more frequent or long.

  • I wrote test program P3 which moves items into and out of the would-be parent and ran P3 at the same time as P2, and that does not seem to make the failures any more frequent or long.

Any other ideas?

A: 

Let me start by double-checking that I understand the question. If you run something like the below snippet, you expect it to fail eventually, right?

while (true)
{
    System.IO.Directory.CreateDirectory( ".\\FooDir" );
    System.IO.Directory.Delete( ".\\FooDir" );
}

If your application is the only thing running on the system that has a handle open to that file, then this feels like a bug. So knowing the OS version would help.

On the other hand, if there is something else in the system that is keeping the handle open for just a little while, then whether this is a bug or not becomes a little more fuzzy. The number of things that try to blindly grok files and directories might surprise you. A naive indexer, for example, might be walking into that directory, enumerating it, looking for files to index and so on -- and if you collide with him, blammo. A similarly naive anti-virus filter, or some other file system filter, might be poking it as well (in this case, it still feels like a bug).

There are little things we've done in the OS to try and give services like these ways to get out of your way. Does it repro if you turn the indexer off, if you turn off any anti-virus, any anti-malware? We can go from there, and hopefully we will find that newer bits have it fixed already (that statement had a lot of assumptions in it, I know).

One other relatively interesting piece of trivia is that ERROR_ACCESS_DENIED is a Win32 error that is mapped from more than one underlying status in the system (see this article for example). So if we can dig a little deeper, we may be able to find out what the file system is trying to tell the app (if it's more than access denied).

We might end up getting into a conversation about whether you can, in the wild, assume that your app is the only thing poking at your files and directories. You can probably guess where that one will go.

jrtipton
Your loop is an essentially correct restatement of P2 except that [a] P2 makes Win32 calls in C++ rather than what I assume is C# at the .NET level and [b] P2 collects statistics about how long CreateDirectory succeeds/fails and which GetLastError value was produced
Integer Poet
Part of the problem at the moment is reproducing this problem, which is why I've asked what might cause it. When I originally became aware of it, it was happening once in three runs of the program I actually care about (as opposed to my test tool). Foolishly, I rebooted, and, since then, it's been very difficult to reproduce, even though I haven't changed my configuration. In other words, if indexing were the issue, it's still turned on, and I still can't repro the problem.
Integer Poet
Needless to say, I would be very interested to know which NT error got mapped onto ERROR_ACCESS_DENIED. But I'll read the list and see if my imagination comes up with anything.
Integer Poet
I originally encountered this issue on Windows 7 and I haven't regressed it anywhere else.
Integer Poet
.NET vs. Win32 API is probably not relevant; I was just re-expressing it to make certain I understand. If it's not reproducing, obviously this makes diagnosing it much more difficult.If you start seeing it again, please feel free to contact me directly. Honestly my intuition tells me it's something else in the system playing w/the directory and racing with you, but I would love to know which component is at fault so we could get a resolution.I'd still say that practically, if you want to defend against this, you will just have to accept the fact that this can happen in the real world.
jrtipton
Oh, I am absolutely resigned to that. I just want to be able to test my code which copes, and doing that makes it necessary to be able to reproduce the problem.
Integer Poet
+1  A: 

I would wager a guess that your enumeration / deletion / creation is causing some synchronization problems with the handles. If CreateDirectory is anything like CreateFile (and I would assume the logic behind it would be shared), then you would see similar behaviour to CreateFile:

If you call CreateFile on a file that is pending deletion as a result of a previous call to DeleteFile, the function fails. The operating system delays file deletion until all handles to the file are closed. GetLastError returns ERROR_ACCESS_DENIED.

GalacticJello
In my test tool, I repeatedly create and remove a directory whose name I can pretty much guarantee nobody else would use. Say it's "my hovercraft is full of eels" (which it actually has been, but I try to use something different every time). Since my test tool is single-threaded, some other process would have to be removing my directory at the moment I try to create it. That seems unlikely.
Integer Poet
But remember you are constantly enumerating the parent directory, which would include it's children, which would include your directory that you are creating and deleting (no matter how crazily it is named). Think about it: One process is commanding "Give me all of the children of X", while the other process is commanding "Create a child of X" "Delete a child of X" at the same time. All the while, your subsystems (disk, queue, MTF, etc) are trying to keep up with all those demands. You can't ask for existence and demand its removal at the same time.
GalacticJello
Also (rereading your comment) it doesn't matter if your ONE process is single threaded, as you are running TWO processes that are trying to access the SAME resource at the same time (one is trying to enumerate, the other is trying to create/delete).
GalacticJello
While all of that seems like it ought to be true, it doesn't help me reproduce the problem any more often. I actually have two test programs. The first repeatedly creates and deletes a directory and the second repeatedly enumerates the would-be parent. The second doesn't make the first fail any more often. I'd feel bad about not yet mentioning this here if I hadn't mentioned it in the original question. :-) However, explaining this again has given me an idea for a tweak to the second program, and I'll go try that soonly.
Integer Poet
My second test program was sleeping 10 ms before each call to FindNextFile in order to keep the enumeration handle open for a "long" time. The idea was to keep the directory busy for a while so the first test program would be more likely to fail and then I'd have my answer: an enumeration in progress makes creation fail. As I said, this didn't work out.
Integer Poet
But then I realized maybe all the disk i/o happens during FindFirstFile and gets cached in the handle and FindNextFile just copies it from one place in memory to another, so the more often I call FindFirstFile, the better. So I skipped the calls to FindNextFile entirely. Sadly, this makes no difference.
Integer Poet
Calling FindNextFile without delay doesn't make a difference either.
Integer Poet
I keep saying this, but this would be a bug. Enumeration of a parent has nothing to do with the open-ability of a child.
jrtipton
You guys need to understand the concept of HANDLES and how they play behind-the-scenes. This is a synchronization issue. Enumerating opens HANDLES on objects. Open HANDLES can place locks on objects. When objects are locked, other calls trying to access those objects can fail.
GalacticJello
jrtipton, I agree. I'm not trying to fix the bug. I'm trying to cope.
Integer Poet
GJ, everybody understands that. I'm trying to figure out how to cause a lock to be taken on the would-be-parent directory so I can test the code I have which copes.
Integer Poet
You have a system process, call it S. It is in charge of commiting changes to a file system. You have another process, call it P1. It is enumerating a directory, call it D1. P1 is constantly saying, "give me my CURRENT children of D1 as of RIGHT NOW". S has to commit all pending actions to answer. You have yet another process, call it P2. It is constantly creating and deleteing a child in D1 demanding S to add, delete, etc. S is trying to keep up. P1 is trying to say "give me all my current children, now!". S is trying to keep up. P2 is saying "create, delete, right now!". S says ENOUGH.
GalacticJello