views:

935

answers:

3

Lets take the sample class SomeThread where we are attempting to prevent the DoSomething methods from being called after the Running property is set to false and Dispose is called by the OtherThread class because if they are called after the Dispose method is the world would end as we know it.

It feels like there is a chance for something evil to happen because of the loop. That at the point where it starts the next loop and before the lock is taken before calling the DoSomething methods, Running could be changed to false, and Disposed called before it hits the lock. In this scenario life would not be good.

I was looking at ways to handle this when using a loop in a simple easy to maintain method. For the record I did considered the Double Lock Check patterned, however it is does not seem to be recommend for C#.

Warning: This is a simplified example to try to make it easy to focus on the issue with the loop and locking within one. If I didn't elaborate enough some place please let me know and I will do my best to fill in any details.

public class SomeThread : IDisposable
{
    private object locker = new object();
    private bool running = false;

    public bool Running 
    { 
        get
        {
            lock(locker)
            {
                return running;
            }
        }
        set
        {
            lock(locker)
            {
                running = value;
            }
        }
    }

    public void Run()
    {
        while (Running)
        {
            lock(locker)
            {
                DoSomething1();
                DoSomething2();
            }
        }
    }

    private void DoSomething1()
    {
        // something awesome happens here
    }

    private void DoSomething2()
    {
        // something more awesome happens here
    }

    public void Dispose()
    {
        lock (locker)
        {   
            Dispose1();
            Dispose2();
        }
    }

    private void Dispose1()
    {
        // something awesome happens here
    }

    private void Dispose2()
    {
        // something more awesome happens here
    }

}

public class OtherThread
{
    SomeThread st = new SomeThread();

    public void OnQuit()
    {
        st.Running = false;
        st.Dispose();

        Exit();
    }
}
+4  A: 

Check Running again inside the lock:

while (Running)
{
    lock(locker)
    {
        if(Running) {
            DoSomething1();
            DoSomething2();
        }
    }
}

You could even rewrite this as a while(true)...break, which would probably be preferable.

Anon.
That is the Double Lock Check pattern that I mentioned in the post, and it is not recommend for use in C#.
Rodney Foley
Then convert it to a `while(true)...break` loop, and you no longer check twice.
Anon.
As Eric Lippert says in his answer (currently below), the double-checked lock pattern is now OK in C#, as of CLR v2.
Gabriel
Even so Eric states why in the comment it is still not recommended. Beyond that is is not 100% Thread Safe, especially when dealing with Foreground threads and making sure they are shut down correctly before an application exits. This example code doesn't fall under one of the few "bless" practices for using the Double Lock Check pattern.
Rodney Foley
+20  A: 

Take a step back.

Start by specifying all the desirable and undesirable characteristics before you start to write a solution. A few that come immediately to mind:

  • The "work" is done on thread W. The "UI" is done on thread U.
  • The work is done in "units of work". Each unit of work is "short" in duration, for some definition of "short". Let's call the method that does the work M().
  • The work is done continuously by W, in a loop, until U tells it to stop.
  • U calls a cleanup method, D(), when all the work is done.
  • D() must not ever run before or while M() is running.
  • Exit() must be called after D(), on thread U.
  • U must never block for a "long" time; it is acceptable for it to block for a "short" time.
  • No deadlocks, and so on.

Does this sum up the problem space?

First off, I note that it seems at first glance that the problem is that U must be the caller of D(). If W were the caller of D(), then you wouldn't have to worry; you'd just signal W to break out of the loop, and then W would call D() after the loop. But that just trades one problem for another; presumably in this scenario, U must wait for W to call D() before U calls Exit(). So moving the call to D() from U to W doesn't actually make the problem easier.

You've said that you don't want to use double-checked locking. You should be aware that as of CLR v2, the double-checked locking pattern is known to be safe. The memory model guarantees were strengthened in v2. So it is probably safe for you to use double-checked locking.

UPDATE: You asked for information on (1) why is double-checked locking safe in v2 but not in v1? and (2) why did I use the weasel-word "probably"?

To understand why double-checked locking is unsafe in the CLR v1 memory model but safe in the CLR v2 memory model, read this:

http://msdn.microsoft.com/en-us/magazine/cc163715.aspx

I said "probably" because as Joe Duffy wisely says:

once you venture even slightly outside of the bounds of the few "blessed" lock-free practices [...] you are opening yourself up to the worst kind of race conditions.

I do not know if you are planning on using double-checked locking correctly, or if you're planning on writing your own clever, broken variation on double-checked locking that in fact dies horribly on IA64 machines. Hence, it will probably work for you, if your problem is actually amenable to double checked locking and you write the code correctly.

If you care about this you should read Joe Duffy's articles:

http://www.bluebytesoftware.com/blog/2006/01/26/BrokenVariantsOnDoublecheckedLocking.aspx

and

http://www.bluebytesoftware.com/blog/2007/02/19/RevisitedBrokenVariantsOnDoubleCheckedLocking.aspx

And this SO question has some good discussion:

http://stackoverflow.com/questions/1964731/the-need-for-volatile-modifier-in-double-checked-locking-in-net

Probably it is best to find some other mechanism other than double-checked locking.

There is a mechanism for waiting for one thread which is shutting down to complete -- thread.Join. You could join from the UI thread to the worker thread; when the worker thread is shut down, the UI thread wakes up again and does the dispose.

UPDATE: Added some information on Join.

"Join" basically means "thread U tells thread W to shut down, and U goes to sleep until that happens". Brief sketch of the quit method:

// do this in a thread-safe manner of your choosing
running = false; 
// wait for worker thread to come to a halt
workerThread.Join(); 
// Now we know that worker thread is done, so we can 
// clean up and exit
Dispose(); 
Exit();   

Suppose you didn't want to use "Join" for some reason. (Perhaps the worker thread needs to keep running in order to do something else, but you still need to know when it is done using the objects.) We can build our own mechanism that works like Join by using wait handles. What you need now are two locking mechanisms: one that lets U send a signal to W that says "stop running now" and then another that waits while W finishes off the last call to M().

What I would do in this circumstance is:

  • make a thread-safe flag "running". Use whatever mechanism you are comfortable with to make it thread safe. I would personally start with a lock dedicated to it; if you decide later that you can go with lock-free interlocked operations on it then you can always do that later.
  • make an AutoResetEvent to act as a gate on the dispose.

So, brief sketch:

UI thread, startup logic:

running = true
waithandle = new AutoResetEvent(false)
start up worker thread

UI thread, quit logic:

running = false; // do this in a thread-safe manner of your choosing
waithandle.WaitOne(); 

// WaitOne is robust in the face of race conditions; if the worker thread
// calls Set *before* WaitOne is called, WaitOne will be a no-op.  (However,
// if there are *multiple* threads all trying to "wake up" a gate that is
// waiting on WaitOne, the multiple wakeups will be lost. WaitOne is named
// WaitOne because it WAITS for ONE wakeup. If you need to wait for multiple
// wakeups, don't use WaitOne.

Dispose();
waithandle.Close();
Exit();    

worker thread:

while(running) // make thread-safe access to "running"
    M();
waithandle.Set(); // Tell waiting UI thread it is safe to dispose

Notice that this relies on the fact that M() is short. If M() takes a long time then you can wait a long time to quit the application, which seems bad.

Does that make sense?

Really though, you shouldn't be doing this. If you want to wait for the worker thread to shut down before you dispose an object it is using, just join it.

UPDATE: Some additional questions raised:

is it a good idea to wait without a timeout?

Indeed, note that in my example with Join and my example with WaitOne, I do not use the variants on them that wait for a specific amount of time before giving up. Rather, I call out that my assumption is that the worker thread shuts down cleanly and quickly. Is this the correct thing to do?

It depends! It depends on just how badly the worker thread behaves and what it is doing when it is misbehaving.

If you can guarantee that the work is short in duration, for whatever 'short' means to you, then you don't need a timeout. If you cannot guarantee that, then I would suggest first rewriting the code so that you can guarantee that; life becomes much easier if you know that the code will terminate quickly when you ask it to.

If you cannot, then what's the right thing to do? The assumption of this scenario is that the worker is ill-behaved and does not terminate in a timely manner when asked to. So now we've got to ask ourselves "is the worker slow by design, buggy, or hostile?"

In the first scenario, the worker is simply doing something that takes a long time and for whatever reason, cannot be interrupted. What's the right thing to do here? I have no idea. This is a terrible situation to be in. Presumably the worker is not shutting down quickly because doing so is dangerous or impossible. In that case, what are you going to do when the timeout times out??? You've got something that is dangerous or impossible to shut down, and its not shutting down in a timely manner. Your choices seem to be (1) do nothing, (2) do something dangerous, or (3) do something impossible. Choice three is probably out. Choice one is equivalent to waiting forever, whcih we've already rejected. That leaves "do something dangerous".

Knowing what the right thing to do in order to minimize harm to user data depends upon the exact circumstances that are causing the danger; analyse it carefully, understand all the scenarios, and figure out the right thing to do.

Now suppose the worker is supposed to be able to shut down quickly, but does not because it has a bug. Obviously, if you can, fix the bug. If you cannot fix the bug -- perhaps it is in code you do not own -- then again, you are in a terrible fix. You have to understand what the consequences are of not waiting for already-buggy-and-therefore-unpredictable code to finish before disposing of the resources that you know it is using right now on another thread. And you have to know what the consequences are of terminating an application while a buggy worker thread is still busy doing heaven only knows what to operating system state.

If the code is hostile and is actively resisting being shut down then you have already lost. You cannot halt the thread by normal means, and you cannot even thread abort it. There is no guarantee whatsoever that aborting a hostile thread actually terminates it; the owner of the hostile code that you have foolishly started running in your process could be doing all of its work in a finally block or other constrained region which prevents thread abort exceptions.

The best thing to do is to never get into this situation in the first place; if you have code that you think is hostile, either do not run it at all, or run it in its own process, and terminate the process, not the thread when things go badly.

In short, there's no good answer to the question "what do I do if it takes too long?" You are in a terrible situation if that happens and there is no easy answer. Best to work hard to ensure you don't get into it in the first place; only run cooperative, benign, safe code that always shuts itself down cleanly and rapidly when asked.

What if the worker throws an exception?

OK, so what if it does? Again, better to not be in this situation in the first place; write the worker code so that it does not throw. If you cannot do that, then you have two choices: handle the exception, or don't handle the exception.

Suppose you don't handle the exception. As of I think CLR v2, an unhandled exception in a worker thread shuts down the whole application. The reason being, in the past what would happen is you'd start up a bunch of worker threads, they'd all throw exceptions, and you'd end up with a running application with no worker threads left, doing no work, and not telling the user about it. It is better to force the author of the code to handle the situation where a worker thread goes down due to an exception; doing it the old way effectively hides bugs and makes it easy to write fragile applications.

Suppose you do handle the exception. Now what? Something threw an exception, which is by definition an unexpected error condition. You now have no clue whatsoever that any of your data is consistent or any of your program invariants are maintained in any of your subsystems. So what are you going to do? There's hardly anything safe you can do at this point.

The question is "what is best for the user in this unfortunate situation?" It depends on what the application is doing. It is entirely possible that the best thing to do at this point is to simply aggressively shut down and tell the user that something unexpected failed. That might be better than trying to muddle on and possibly making the situation worse, by, say, accidentally destroying user data while trying to clean up.

Or, it is entirely possible that the best thing to do is to make a good faith effort to preserve the user's data, tidy up as much state as possible, and terminate as normally as possible.

Basically, both your questions are "what do I do when my subsystems do not behave themselves?" If your subsystems are unreliable, either make them reliable, or have a policy for how you deal with an unreliable subsystem, and implement that policy. That's a vague answer I know, but that's because dealing with an unreliable subsystem is an inherently awful situation to be in. How you deal with it depends on the nature of its unreliability, and the consequences of that unreliability to the user's valuable data.

Eric Lippert
Is it really a good idea to call `WaitOne` without a timeout? What happens if the worker thread throws an exception?
ChaosPandion
Two good questions. I'll update the text.
Eric Lippert
I would like to apologize for essentially calling you a **d-bag**. In reality I have a massive amount of respect for you and your work.
ChaosPandion
Apology accepted; no worries!
Eric Lippert
Still digesting this Eric. :) Do you know of any articles that are about the CLR v2 changes to make double-checked locking pattern safe? The main reason I didn't want to use it, is because I can't find anything saying it is safe yet to use with the CLR. Also if it was safe, you made a comment saying that it is "probably safe". Basically I am trying to find a 100% thread safe solution to this problem when dealing with loops. Thread Joining seems like a good solution, I am still looking into that as well.
Rodney Foley
I didn't reject thread Joining, just how it was presented by nobugz and really I asked him a question about it and got no response. In the example I have there is no way to do a Join as-is, and he provided no example of how to modify the code to use a Join. I am looking into how I can do this myself.
Rodney Foley
@Creepy Gnome, I have updated the text to answer your two questions.
Eric Lippert
@Eric Lippert Thanks, and can't use Join for the reason you mentioned as a possible reasons why. Also based on the feedback around the double-checked locking pattern I would agree best to avoid if another way is available. The WaitOne / Set seems to fit best for this situation, and is completely thread safe and easy to maintain and understand.
Rodney Foley
@Eric Lippert another question on using AutoResetEvent. The sample code you provided seems to assume both threads have access to same instance of AutoResetEvent in order to communicate with each other. I did see an easy or safe way to share this instance with the two threads? Especially if on is a GUI thread, assuming the the OtherThread class is the GUI Thread, and the SomeThread class is a Foreground thread would we create the AutoResetEvent instance in the OtherThread and pass it to the SomeThread when it is created?
Rodney Foley
@Creepy Gnome, it doesn't matter on what thread you create the gate, just so long as you ensure that it is created before either thread uses it. Your proposal of creating the gate and passing it in when the new thread is created seems reasonable. (Also note that though it is not strictly *necessary* to "close" the gate when you're done with it, it is good style to do so.)
Eric Lippert
@Eric Lippert thanks again. So would I still need to keep the mutex object around for locking as in my original sample code, or is a bool enough which is atomic already for changes(and I can mark it volatile to extra safe) enough with the gate object in place?
Rodney Foley
@Creepy Gnome, you're welcome. My choice is always to use explicit locking until performance testing demonstrates that the lock is too slow. It's not too hard to make a lock-free bool flag, particularly one that only goes from true to false throughout its lifetime, but I wouldn't do so unless I had a good reason. Lock-free code makes me very, very nervous.
Eric Lippert
A: 

Instead of using a bool for Running, why not use an Enum with states of Stopped, Starting, Running, and Stopping?

That way, you break out of the loop when Running gets set to Stopping, and do your Disposing. Once that's done, Running gets set to Stopped. When OnQuit() sees Running set to Stopped, it will go ahead and exit.

Edit: Here's code, quick and dirty, not tested, etc.

public class SomeThread : IDisposable 
{ 
    private object locker = new object(); 
    private RunState running = RunState.Stopped;

    public enum RunState
    {
        Stopped,
        Starting,
        Running,
        Stopping,
    }

    public RunState Running  
    {  
        get 
        { 
            lock(locker) 
            { 
                return running; 
            } 
        } 
        set 
        { 
            lock(locker) 
            { 
                running = value; 
            } 
        } 
    } 

    public void Run() 
    { 
        while (Running == RunState.Running) 
        { 
            lock(locker) 
            { 
                DoSomething1(); 
                DoSomething2(); 
            } 
        } 

        Dispose();

    } 

    private void DoSomething1() 
    { 
        // something awesome happens here 
    } 

    private void DoSomething2() 
    { 
        // something more awesome happens here 
    } 

    public void Dispose() 
    { 
        lock (locker) 
        {    
            Dispose1(); 
            Dispose2(); 
        } 
        Running = RunState.Stopped;
    } 

    private void Dispose1() 
    { 
        // something awesome happens here 
    } 

    private void Dispose2() 
    { 
        // something more awesome happens here 
    } 

} 

public class OtherThread 
{ 
    SomeThread st = new SomeThread(); 

    public void OnQuit() 
    { 
        st.Running = SomeThread.RunState.Stopping; 
        while (st.Running == SomeThread.RunState.Stopping) 
        {                 
            // Do something while waiting for the other thread.
        }  

        Exit(); 
    } 
} 
Moose
You could also move the call to Dispose() into the quit method, and simply make the worker thread set the state to Stopped immediately upon exiting the loop. But the flaw in your plan is the spinning on the UI thread; what are you going to do in that loop? You could just sleep for a while. But I say that if you want to sleep until a condition is true, **do not roll your own code**. Solve the problem with an operating system object specifically designed to solve the problem of "sleep until a condition is met", like a wait handle.
Eric Lippert
I agree, Eric, and I was headed that way when I got sidetracked. Rather than leave my post hanging, I opted for the easy out of "// Do Something here." I just wanted to put forth the idea of having Running conveys information out of the thread as well as into the thread.
Moose