I have a thread that, when its function exits its loop (the exit is triggered by an event), it does some cleanup and then sets a different event to let a master thread know that it is done.
However, under some circumstances, SetEvent() seems not to return after it sets the thread's 'I'm done' event.
This thread is part of a DLL and the problem seems to occur after the DLL has been loaded/attached, the thread started, the thread ended and the DLL detached/unloaded a number of times without the application shutting down in between. The number of times this sequence has to be repeated before this problem happens is variable.
In case you are skeptical that I know what I'm talking about, I have determined what's happening by bracketing the SetEvent() call with calls to OutputDebugString(). The output before SetEvent() appears. Then, the waiting thread produces output that indicates that the Event has been set.
However, the second call to OutputDebugString() in the exiting thread (the one AFTER SetEvent() ) never occurs, or at least its string never shows up. If this happens, the application crashes a few moments later.
(Note that the calls to OutputDebugString() were added after the problem started occurring, so it's unlikely to be hanging there, rather than in SetEvent().)
I'm not entirely sure what causes the crash, but it occurs in the same thread in which SetEvent() didn't return immediately (I've been tracking/outputting the thread IDs). I suppose it's possible that SetEvent() is finally returning, by which point the context to which it is returning is gone/invalid, but what could cause such a delay?
It turns out that I've been blinded by looking at this code for so long, and it didn't even occur to me to check the return code. I'm done looking at it for today, so I'll know what it's returning (if it's returning) on Monday and I'll edit this question with that info then.
Update: I changed the (master) code to wait for the thread to exit rather than for it to set the event, and removed the SetEvent() call from the slave thread. This changed the nature of the bug: now, instead of failing to return from SetEvent(), it doesn't exit the thread at all and the whole thing hangs.
This indicates that the problem is not with SetEvent(), but something deeper. No idea what, yet, but it's good not to be chasing down that blind alley.
Update (Feb 13/09):
It turned out that the problem was deeper than I thought when I asked this question. jdigital (and probably others) has pretty much nailed the underlying problem: we were trying to unload a thread as part of the process of detaching a DLL.
This, as I didn't realize at the time, but have since found out through research here and elsewhere (Raymond Chen's blog, for example), is a Very Bad Thing.
The problem was, because of the way it was coded and the way it was behaving, it not obvious that that was the underlying problem - it was camouflaged as all sorts of other Bad Behaviours that I had to wade through.
Some of the suggestions here helped me do that, so I'm grateful to everyone who contributed. Thank you!