views:

205

answers:

7

Hi all,

I asked the question below couple of weeks ago. Now, when reviewing my question and all the answers, a very important detail jumped into my eyes: In my second code example, isn't DoTheCodeThatNeedsToRunAsynchronously() executed in the main (UI) thread? Doesn't the timer just wait a second and then post an event to the main thread? This would mean then that the code-that-needs-to-run-asynchronously isn't run asynchronously at all?!

Original question:


I have recently faced a problem multiple times and solved it in different ways, always being uncertain on whether it is thread safe or not: I need to execute a piece of C# code asynchronously. (Edit: I forgot to mention I'm using .NET 3.5!)

That piece of code works on an object that is provided by the main thread code. (Edit: Let's assume that object is thread-safe in itself.) I'll present you two ways I tried (simplified) and have these four questions:

  1. What is the best way to achieve what I want? Is it one of the two or another approach?
  2. Is one of the two ways not thread-safe (I fear both...) and why?
  3. The first approach creates a thread and passes it the object in the constructor. Is that how I'm supposed to pass the object?
  4. The second approach uses a timer which doesn't provide that possibility, so I just use the local variable in the anonymous delegate. Is that safe or is it possible in theory that the reference in the variable changes before it is evaluated by the delegate code? (This is a very generic question whenever one uses anonymous delegates). In Java you are forced to declare the local variable as final (i.e. it cannot be changed once assigned). In C# there is no such possibility, is there?

Approach 1: Thread

new Thread(new ParameterizedThreadStart(
    delegate(object parameter)
    {
        Thread.Sleep(1000); // wait a second (for a specific reason)

        MyObject myObject = (MyObject)parameter;

        DoTheCodeThatNeedsToRunAsynchronously();
        myObject.ChangeSomeProperty();

    })).Start(this.MyObject);

There is one problem I had with this approach: My main thread might crash, but the process still persists in the memory due to the zombie thread.


Approach 2: Timer

MyObject myObject = this.MyObject;

System.Timers.Timer timer = new System.Timers.Timer();
timer.Interval = 1000;
timer.AutoReset = false; // i.e. only run the timer once.
timer.Elapsed += new System.Timers.ElapsedEventHandler(
    delegate(object sender, System.Timers.ElapsedEventArgs e)
    {
        DoTheCodeThatNeedsToRunAsynchronously();
        myObject.ChangeSomeProperty();
    });

DoSomeStuff();
myObject = that.MyObject; // hypothetical second assignment.

The local variable myObject is what I'm talking about in question 4. I've added a second assignment as an example. Imagine the timer elapses after the second assigment, will the delegate code operate on this.MyObject or that.MyObject?

+2  A: 

Your first attempt is pretty good, but the thread continued to exist even after the application exits, because you didn't set the IsBackground property to true... here is a simplified (and improved) version of your code:

MyObject myObject = this.MyObject;
Thread t = new Thread(()=>
    {
        Thread.Sleep(1000); // wait a second (for a specific reason)
        DoTheCodeThatNeedsToRunAsynchronously();
        myObject.ChangeSomeProperty();
    });
t.IsBackground = true;
t.Start();

With regards to the thread safety: it's difficult to tell if your program functions correctly when multiple threads execute simultaneously, because you're not showing us any points of contention in your example. It's very possible that you will experience concurrency issues if your program has contention on MyObject.

Java has the final keyword and C# has a corresponding keyword called readonly, but neither final nor readonly ensure that the state of the object you're modifying will be consistent between threads. The only thing these keywords do is ensure that you do not change the reference the object is pointing to. If two threads have read/write contention on the same object, then you should perform some type of synchronization or atomic operations on that object in order to ensure thread safety.

Update

OK, if you modify the reference to which myObject is pointing to, then your contention is now on myObject. I'm sure that my answer will not match your actual situation 100%, but given the example code you've provided I can tell you what will happen:

You will not be guaranteed which object gets modified: it can be that.MyObject or this.MyObject. That's true regardless if you're working with Java or C#. The scheduler may schedule your thread/timer to be executed before, after or during the second assignment. If you're counting on a specific order of execution, then you have to do something to ensure that order of execution. Usually that something is a communication between the threads in the form of a signal: a ManualResetEvent, Join or something else.

Here is a join example:

MyObject myObject = this.MyObject;
Thread task = new Thread(()=>
    {
        Thread.Sleep(1000); // wait a second (for a specific reason)
        DoTheCodeThatNeedsToRunAsynchronously();
        myObject.ChangeSomeProperty();
    });
task.IsBackground = true;
task.Start();
task.Join(); // blocks the main thread until the task thread is finished
myObject = that.MyObject; // the assignment will happen after the task is complete

Here is a ManualResetEvent example:

ManualResetEvent done = new ManualResetEvent(false);
MyObject myObject = this.MyObject;
Thread task = new Thread(()=>
    {
        Thread.Sleep(1000); // wait a second (for a specific reason)
        DoTheCodeThatNeedsToRunAsynchronously();
        myObject.ChangeSomeProperty();
        done.Set();
    });
task.IsBackground = true;
task.Start();
done.WaitOne(); // blocks the main thread until the task thread signals it's done
myObject = that.MyObject; // the assignment will happen after the task is done

Of course, in this case it's pointless to even spawn multiple threads, since you're not going to allow them to run concurrently. One way to avoid this is by not changing the reference to myObject after you've started the thread, then you won't need to Join or WaitOne on the ManualResetEvent.

So this leads me to a question: why are you assigning a new object to myObject? Is this a part of a for-loop which is starting multiple threads to perform multiple asynchronous tasks?

Lirik
(Removed my comment -- didn't see his question about "zombie threads")
Kirk Woll
@Lirik, thanks for the `IsBackground` hint. Concerning the variable I indeed meant chaning the *reference*, so following your answer I can declare a local variable as `readonly` and prevent the reference from being changed just like in Java, can I?
chiccodoro
@Lirik: I've found that `readonly` cannot be used for local variables, see also: http://stackoverflow.com/questions/443687/why-does-c-disallow-readonly-local-variables
chiccodoro
@chiccodoro, ah, yes... I forgot to mention that while the readonly and the final keyword do similar things, they're still different in some cases: readonly cannot be used for a local variable and it cannot be used in front of a method.
Lirik
@Lirik: Thank you. Yes there is a difference between Java and C#: In Java you can declare the variable as "final" to make sure it (the reference) is never changed. In fact you're even forced to if you want to use it in an anonymous inner class code (which corresponds more or less to an anonymous delegate in C#). - Your question why I want to re-assign: I do not want to, but I was interested (in particular) in what would happen if it is reassigned by accident and how to prevent that.
chiccodoro
+3  A: 

Whether or not either of these pieces of code is safe has to do with the structure of MyObject instances. In both cases you are sharing the myObject variable between the foreground and background threads. There is nothing stopping the foreground thread from modifying myObject while the background thread is running.

This may or may not be safe and depends on the structure of MyObject. However if you haven't specifically planned for it then it's most certainly an unsafe operation.

JaredPar
It's also worth noting that the `myObject` local is closed over in the second case, which means that assignments to the reference itself (presumably added in subsequent lines of code after the definition of the anonymous function) would be observed when the code executes.
LBushkin
Hi LBushkin, can you expand on that? What does "closed over" mean, and "be observed" mean? I add a hypothetical second assignment to my example code.
chiccodoro
@JaredPar: How right you are! Just to narrow down my question to these very code parts, let's assume that MyObject is programmed thread-safely.
chiccodoro
+2  A: 

"Thread-safe" is a tricky beast. With both of your approches, the problem is that the "MyObject" your thread is using may be modified/read by multiple threads in a way that makes the state appear inconsistent, or makes your thread behave in a way inconsistent with actual state.

For example, say your MyObject.ChangeSomeproperty() MUST be called before MyObject.DoSomethingElse(), or it throws. With either of your approaches, there is nothing to stop any other thread from calling DoSomethingElse() before the thread that will call ChangeSomeProperty() finishes.

Or, if ChangeSomeProperty() happens to be called by two threads, and it (internally) changes state, the thread context switch may happen while the first thread is in the middle of it's work and the end result is that the actual new state after both threads is "wrong".

However, by itself, neither of your approaches is inherently thread-unsafe, they just need to make sure that changing state is serialized and that accessing state is always giving a consistent result.

Personally, I wouldn't use the second approach. If you're having problems with "zombie" threads, set IsBackground to true on the thread.

Philip Rieck
+2  A: 

What is the best way to achieve what I want? Is it one of the two or another approach?

Both look fine, but...

Is one of the two ways not thread-safe (I fear both...) and why?

...they are not thread safe unless MyObject.ChangeSomeProperty() is thread safe.

The first approach creates a thread and passes it the object in the constructor. Is that how I'm supposed to pass the object?

Yes. Using a closure (as in your second approach) is fine as well, with the additional advantage that you don't need to do a cast.

The second approach uses a timer which doesn't provide that possibility, so I just use the local variable in the anonymous delegate. Is that safe or is it possible in theory that the reference in the variable changes before it is evaluated by the delegate code? (This is a very generic question whenever one uses anonymous delegates).

Sure, if you add myObject = null; directly after setting timer.Elapsed, then the code in your thread will fail. But why would you want to do that? Note that changing this.MyObject will not affect the variable captured in your thread.


So, how to make this thread-safe? The problem is that myObject.ChangeSomeProperty(); might run in parallel with some other code that modifies the state of myObject. There are basically two solutions to that:

Option 1: Execute myObject.ChangeSomeProperty() in the main UI thead. This is the simplest solution if ChangeSomeProperty is fast. You can use the Dispatcher (WPF) or Control.Invoke (WinForms) to jump back to the UI thread, but the easiest way is to use a BackgroundWorker:

MyObject myObject = this.MyObject;
var bw = new BackgroundWorker();

bw.DoWork += (sender, args) => {
    // this will happen in a separate thread
    Thread.Sleep(1000);
    DoTheCodeThatNeedsToRunAsynchronously();
}

bw.RunWorkerCompleted += (sender, args) => {
    // We are back in the UI thread here.

    if (args.Error != null)  // if an exception occurred during DoWork,
        MessageBox.Show(args.Error.ToString());  // do your error handling here
    else
        myObject.ChangeSomeProperty();
}

bw.RunWorkerAsync(); // start the background worker

Option 2: Make the code in ChangeSomeProperty() thread-safe by using the lock keyword (inside ChangeSomeProperty as well as inside any other method modifying or reading the same backing field).

Heinzi
Thx for the `BackgroundWorker` example, since it seems I cannot use the `Task` in 3.5
chiccodoro
+3  A: 

I recommend using Task objects, and restructuring the code so that the background task returns its calculated value rather than changing some shared state.

I have a blog entry that discusses five different approaches to background tasks (Task, BackgroundWorker, Delegate.BeginInvoke, ThreadPool.QueueUserWorkItem, and Thread), with the pros and cons of each.

To answer your questions specifically:

  1. What is the best way to achieve what I want? Is it one of the two or another approach? The best solution is to use the Task object instead of a specific Thread or timer callback. See my blog post for all the reasons why, but in summary: Task supports returning a result, callbacks on completion, proper error handling, and integration with the universal cancellation system in .NET.
  2. Is one of the two ways not thread-safe (I fear both...) and why? As others have stated, this totally depends on whether MyObject.ChangeSomeProperty is threadsafe. When dealing with asynchronous systems, it's easier to reason about threadsafety when each asynchronous operation does not change shared state, and rather returns a result.
  3. The first approach creates a thread and passes it the object in the constructor. Is that how I'm supposed to pass the object? Personally, I prefer using lambda binding, which is more type-safe (no casting necessary).
  4. The second approach uses a timer which doesn't provide that possibility, so I just use the local variable in the anonymous delegate. Is that safe or is it possible in theory that the reference in the variable changes before it is evaluated by the delegate code? Lambdas (and delegate expressions) bind to variables, not to values, so the answer is yes: the reference may change before it is used by the delegate. If the reference may change, then the usual solution is to create a separate local variable that is only used by the lambda expression,

as such:

MyObject myObject = this.MyObject;
...
timer.AutoReset = false; // i.e. only run the timer once.
var localMyObject = myObject; // copy for lambda
timer.Elapsed += new System.Timers.ElapsedEventHandler(
  delegate(object sender, System.Timers.ElapsedEventArgs e)
  {
    DoTheCodeThatNeedsToRunAsynchronously();
    localMyObject.ChangeSomeProperty();
  });
// Now myObject can change without affecting timer.Elapsed

Tools like ReSharper will try to detect whether local variables bound in lambdas may change, and will warn you if it detects this situation.

My recommended solution (using Task) would look something like this:

var ui = TaskScheduler.FromCurrentSynchronizationContext();
var localMyObject = this.myObject;
Task.Factory.StartNew(() =>
{
  // Run asynchronously on a ThreadPool thread.
  Thread.Sleep(1000); // TODO: review if you *really* need this   

  return DoTheCodeThatNeedsToRunAsynchronously();   
}).ContinueWith(task =>
{
  // Run on the UI thread when the ThreadPool thread returns a result.
  if (task.IsFaulted)
  {
    // Do some error handling with task.Exception
  }
  else
  {
    localMyObject.ChangeSomeProperty(task.Result);
  }
}, ui);

Note that since the UI thread is the one calling MyObject.ChangeSomeProperty, that method doesn't have to be threadsafe. Of course, DoTheCodeThatNeedsToRunAsynchronously still does need to be threadsafe.

Stephen Cleary
+1: Very insightful and helpful answer, thank you! I'll read your blog and consider using `Task` as you said.
chiccodoro
Uh oh, I see, Task is a .NET 4 feature. Unfortunately I forgot to mention I have .NET 3.5. Still the answer is very good for reference.
chiccodoro
`Task` is available for .NET 3.5 [here](http://msdn.microsoft.com/en-us/devlabs/ee794896.aspx).
Stephen Cleary
@Hi Stephen: I'm sorry not to accept your answer, only because I have .NET 3.5, and Heinzi's code is what I've adopted now. I've read your comment but it was the simplest way for my situation just to use what is already there. If I could, I'd double-upvote your answer ;-) so it would appear right after Heinzi's, as a reference for all the future readers who might still be interested in the `Task` thing.
chiccodoro
+2  A: 

The bigger thread-safety concern here, in my mind, may be the 1 second Sleep. If this is required in order to synchronize with some other operation (giving it time to complete), then I strongly recommend using a proper synchronization pattern rather than relying on the Sleep. Monitor.Pulse or AutoResetEvent are two common ways to achieve synchronization. Both should be used carefully, as it's easy to introduce subtle race conditions. However, using Sleep for synchronization is a race condition waiting to happen.

Also, if you want to use a thread (and don't have access to the Task Parallel Library in .NET 4.0), then ThreadPool.QueueUserWorkItem is preferable for short-running tasks. The thread pool threads also won't hang up the application if it dies, as long as there is not some deadlock preventing a non-background thread from dying.

Dan Bryant
@Dan: QueueUserWorkItem sounds interesting. For the synchronization, I've also learnt some new things from your answer. Only, in my situation I actually have to wait for Excel to release a file. I guess after having mentioned the word "Excel" there's no need to expand on synchronisation :-) / :-(
chiccodoro
+1  A: 
STW
`Task` was carefully designed to replace all of these. Long-running tasks just use `TaskCreationOptions.LongRunning`. I recently wrote [a blog post](http://nitoprograms.blogspot.com/2010/08/various-implementations-of-asynchronous.html) comparing all of these approaches. `Task` is the clear winner except for just a handful of cases where `BackgroundWorker` is simpler. `QueueUserWorkItem` and `Thread` didn't even get an honorable mention. ;)
Stephen Cleary