views:

62

answers:

1

My Win32 app A1 (actually a collection of processes) is trying to use CreateDirectory to create a directory D1 within parent directory P. The path to P is the value of the TMP environment variable, which makes P a potentially busy but generally permissive place. The vast majority of the time, everything works fine, but, rarely, CreateDirectory fails and GetLastError then returns ERROR_ACCESS_DENIED, the meaning of which in this context is not documented.

I wrote a test application A2 which does nothing but repeatedly create and delete a directory D2 as fast as it can within P, and I chose a goofy long name for D2 which I'm confident does not collide with any that any other program would use. Once every few minutes, there's a small fraction of a second during which A2's attempts to create D2 yield only ERROR_ACCESS_DENIED failures.

A1 gets quite busy within P during its run. While A1 and A2 are running concurrently, the periods of ERROR_ACCESS_DENIED failure occur somewhat more frequently, as if A1 and A2 are competing for exclusive access to P. (I am absolutely certain that A1 does not use the same name as D2. :-)

I'm somewhat inclined to take ERROR_ACCESS_DENIED to mean "try again in a few milliseconds, and if that doesn't work after a few tries, give up", but I'm concerned that [a] in some cases it may mean something permanent that I should heed right away, and [b] because I don't really know what's happening, it may not be possible to confidently establish a reasonable amount of time to keep trying.

Anybody have experience with this? Any advice? Of particular value at this point would be clues about what causes this so I can reproduce the problem more easily.

A: 

You're dead right. The documentation doesn't even list ERROR_ACCESS_DENIED as a possible error code for that function so it may well be a bug.

I would do as you suggest in implementing a retry/backoff strategy.

In other words, if you get that error, try again up to three times with no delay (obviously stop at any point here if you get a non-error return code), then up to four more times with delays of (for example, 100 milliseconds, 500 milliseconds, 1 second and 2 seconds).

This sort of strategy (which I've used before) usually gets around any temporary resource shortages. If you still can't create the directory after 7 attempts and 3.6+ seconds, you can probably safely assume it's not going to happen.

Your function could be as ugly as (pseudo-code):

def createMyDir (dirname):
    if createDir (dirName) return true;
    if createDir (dirName) return true;
    if createDir (dirName) return true;
    sleep (100)
    if createDir (dirName) return true;
    sleep (500)
    if createDir (dirName) return true;
    sleep (1000)
    if createDir (dirName) return true;
    sleep (2000)
    return createDir (dirName);

but you may want you make it a little more elegant:

def createMyDir (dirname):
    delay = pointer to array [0, 0, 0, 100, 500, 1000, 2000, -1]
    okay = createDir (dirName)
    while not okay and [delay] not -1:
        if [delay] not 0:
            sleep ([delay])
        delay = next delay
        okay = createDir (dirName)
    return okay
paxdiablo
This seems to work reasonably well in a test scenario. However, it seems I can no longer reproduce this problem in my real program. Any idea what causes it?
Integer Poet