views:

144

answers:

4

I'm trying to fix an intermittent bug in git-svn. The problem is happening in Windows XP only, with both Cygwin git (perl v5.10.1) and msysGit (perl v5.8.8).

With any operation that involves a fetch, I'm able to get partway through and then the operation dies with a message similar to

Couldn't open .git/svn/refs/remotes/trunk/.rev_map.cc05479a-e8ea-436f-8d71-e07493b7796c.lock: Device or resource busy

at /usr/lib/git-core/git-svn line 5240

However, the exact lock file and line number are not always the same. I've tracked the actual problem to line 3679

sysopen(my $fh, $db_lock, O_RDWR | O_CREAT)

This is creating a new .lock file, and I tried the equivalent to no avail.

open(my $fh, ">", $db_lock)

I checked the permissions of the directory, and it is drwxr-xr-x, so there shouldn't be any problems, or if they were, they wouldn't be so inconsistent.

Could this be because the script is creating and renaming this file so many times in quick succession that XP can't handle it? EDIT: my suspicion is that this is the case, because when I used the perl debugger and kicked off the execution of each sysopen manually, there were no problems for the 100 revisions I fetched.

EDIT: Some Git developers would much rather find out the root cause than go with a hack that happens to work (the right approach, I think). So, can anyone help me find the culprit denying my permission to open these .lock files? I have a number of tools that could theoretically be used for this purpose, but they don't quite go all the way:

  • Process Explorer - shows all handles owned by a process, and can also search for all processes owning a given handle. However, it doesn't work well for short lived processes or handles (which is what git svn clone/fetch do)
  • Unlocker - Detects when a generic 'permission denied' dialog appears and finds the offending handle(s) and offers to deal with them. However, it doesn't come up when non-explorer programs encounter file-based errors

In short, is there any way I can get more information without being a Microsoft employee?

EDIT 2: It's probably not Symantec, but another program we have running on the networked computers. I have some people looking into it, and they should be able to at least narrow down the cause here.

+4  A: 

My current hack of a solution is to replace the sysopen with this

my $fh;
if ($^O eq 'MSWin32' or $^O eq 'cygwin') {
   for my $try (1..10) { # Retry up to 10 times on problematic systems
       sysopen($fh, $db_lock, O_RDWR | O_CREAT);
       last if $fh;
   }
} else {
   sysopen($fh, $db_lock, O_RDWR | O_CREAT);
}

croak "Couldnt open $db_lock: $!\n" unless $fh;'

And so far, it's working pretty well. Most of the time it doesn't print any .'s, and occasionally it prints one, and I haven't seen it print more than one in a row. Is this solution too hacky?

Edit: My code replaced by Ævar Arnfjörð Bjarmason's cleaned up version.

drhorrible
That makes sense. A lot of code out there lacks robust behavior.
Axeman
I'd suggest a Time::HiRes::sleep(.01) in that loop. Also, maybe `last if $fh || $! != Errno::EBUSY`
ysth
Still haven't nailed down the cause, so I'm giving the bounty to myself
drhorrible
+5  A: 

This sort of behavior can usually be attributed to an antivirus component keeping the file open and delaying deletion.

Ben Voigt
That's my best guess, too. I don't have any control over the antivirus that runs on machine, so the only solution is to change Git, right? Or maybe just msysGit?
drhorrible
Can you even disable it temporarily to see if the problem goes away? In your question you talk about finding and fixing the root cause... if the antivirus is buggy then any change you make to git will be nothing more than a workaround.
Ben Voigt
It's Symantec, and I can't kill or pause it. I could try to work with IT, but if it is the problem, wouldn't I then have to rely on Symantec to fix it?
drhorrible
Or tell your IT department that Symantec doesn't actually work right, and (some other antivirus solution you test with) works fine. Definitely more effort than adding a workaround to git, but (1) there are literally thousands of other applications that have experience problems caused by bugs in Norton Antivirus, how many of these are you running into on your computer, and (2) the git team is very unlikely to accept such a workaround into trunk, so you'll have to redo the workaround for every new release of git.
Ben Voigt
Symantec hasn't caused me any problems yet (that I know of) and I really doubt I'll be able to get my company to switch. Maybe if I sent the patch to msysGit?
drhorrible
@drhorrible: if an "official" mention of the problem with symantec may be of use, see the Cygwin Big List of Dodgy Apps: http://cygwin.com/faq/faq.using.html#faq.using.bloda
ysth
We've identified the software that's causing this issue and are adding git to its whitelist.
drhorrible
@drhorrible: Glad you were able to address the root cause.
Ben Voigt
+1  A: 

I would use Process Monitor and let it run until the failure happens again. Then in Process Monitor you should see an error while your program accesses the file (most likely either ACCESS_DENIED or SHARING_VIOLATION). Then you can filter by that filename and see what other processes (if any) opened it.

Luke
I think what I'm getting is "FAST IO DISALLOWED". I should be filtering on the 'Path' column, right?
drhorrible
You can ignore "FAST IO DISALLOWED"; it's normal. When an error occurs you should filter the "Path" column to included only the file that caused the error; this should show you what other processes were accessing it.
Luke
A: 

If your program is calling "fork()" or "system()" or "exec()" anywhere, this could very probably be the root of the problem.

krico