views:

410

answers:

7

I encountered a strange problem with our Windows C# / .NET application. Actually it is a GUI application, my job is the included network component, encapsulated in an assembly. I do not know the code of the main/GUI application, I could contact it's developer though.

Now the application's UI has buttons to "Start" and "Stop" the network engine. Both buttons work. To make my component threadsafe I am using a lock around three methods. I dont't want a client to be able to call Stop() before Start() finished. Additinally there is a Polling Timer.

I tried to show you as few lines as possible and simpified the problem:

private Timer actionTimer = new Timer(new
                TimerCallback(actionTimer_TimerCallback),
                null, Timeout.Infinite, Timeout.Infinite);

public void Start()
{
 lock (driverLock)
 {
  active = true;
  // Trigger the first timer event in 500ms
  actionTimer.Change(500, Timeout.Infinite);
 }
}

private void actionTimer_TimerCallback(object state)
{
 lock (driverLock)
 {
  if (!active) return;
  log.Debug("Before event");
  StatusEvent(this, new StatusEventArgs()); // it hangs here
  log.Debug("After event");
  // Now restart timer
  actionTimer.Change(500, Timeout.Infinite);
 }
}

public void Stop()
{
 lock (driverLock)
 {
  active = false;
 }
}

Here is how to reproduce my problem. As I said, the Start and Stop buttons both work, but if you press Start(), and during the execution of the TimerCallback press Stop(), this prevents the TimerCallback to return. It hangs exactly at the same position, the StatusEvent. So the lock is never released and the GUI also hangs, because it's call of the Stop() method cannot proceed.

Now I observed the following: If the application hangs because of this "deadlock" and I click on the application in the task bar with the right mouse button, it continues. It just works as expected then. Anybody has an explanation or better a solution for this?

By the way, I also tried it with InvokeIfRequired as I don't know the internas of the GUI application. This is neccesary if my StatusEvent would change something in the GUI. Since I have no reference to the GUI controls, I used (assuming only one target):

Delegate firstTarget = StatusEvent.GetInocationList()[0];
ISynchronizeInvoke syncInvoke = firstTarget.Target as ISynchronizeInvoke;
if (syncInvoke.InvokeRequired)
{
  syncInvoke.Invoke(firstTarget, new object[] { this, new StatusEventArgs() });
}
else
{
  firstTarget.Method.Invoke(firstTarget.Target, new object[] { this, new StatusEventArgs() });
}

This approach didn't change the problem. I think this is because I am Invoking on the main application's event handlers, not on the GUI controls. So the main app is responsible for Invoking? But anyway, AFAIK not using Invoke although needed would not result in a deadlock like this but (hopefully) in an exception.

+1  A: 

If you don't have the source for the GUI (which you probably should) you can use Reflector to disassemble it. There is even a plugin to generate source files so you could run the app in your VS IDE and set breakpoints.

Cory Charlton
I coop with the GUI developer, but have no source insight. The question is, how is it possible to fix this. And who has to do it, the GUI developer or myself?
Tarnschaf
+1  A: 

A couple things come to mind when reviewing your code. The first thing is that you are not checking for a null delegate before firing the status event. If no listeners are bound to the event, then this will cause an exception, which if not caught or handled, might cause strange issues in threaded code.

So the first thing I'd so is this:

if(StatusEvent != null)
{
  StatusEvent(this, new StatusEventArgs());
}

The other thing that comes to mind is that perhaps your lock is failing you in some manner. What type of object are you using for the lock? The simplest thing to use is just a plain ole "object", but you must ensure you are not using a value type (e.g. int, float, etc.) that would be boxed for locking, thus never really establishing a lock since each lock statement would box and create a new object instance. You should also keep in mind that a lock only keeps "other" threads out. If called on the same thread, then it will sail through the lock statement.

Michael McCloskey
Thanks, I didn't post the actual code: I check for empty event handlers where applicable. Please note that even your recommandation may fail if StatusEvent is set to null exactly after the if! You have to copy the delegate first to be REALLY sure. Second: My driverLock is actually a plain private object just created for locking.
Tarnschaf
A: 

Not having access to the GUI source makes this harder, but a general tip here... The WinForm GUI is not managed code, and doesn't mix well with .NET threading. The recommended solution for this is to use a BackgroundWorker to spawn a thread that is independent of the WinForm. Once you're running in the thread started by the BackgroundWorker, you're in pure managed code and you can use .NET's timers and threading for pretty much anything. The restriction is that you have to use the BackgroundWorker's events to pass information back to the GUI, and your thread started by the BackgroundWorker can't access the Winform controls.

Also, you'd be well off to disable the "Stop" button while the "Start" task is running, and vice versa. But a BackgroundWorker is still the way to go; that way the WinForm doesn't hang while the background thread is running.

Cylon Cat
So the WinForm might stop receiving events and by the right-click on the task bar icon it is somehow triggered (like when receiving a Windows Message) and processes the .NET event. If the form would actually be listening for the events, this might be the point. I will ask the GUI programmer!
Tarnschaf
Yes. The WinForm is always listening for events, and it listens to the Windows messages, not .NET events. The WinForm translates those external events into events for the WinForm controls.
Cylon Cat
To clarify, the WinForm is listening for events when it's not doing anything else. As long as your code is doing something, and you're waiting for your task to complete, you **need** to give control back to the Winform; it can't get back to its own event listener until your code finishes and cleans up. Even though the WinForm is a .NET app, the window frame and drawing surface are running in Win32, using Win32 events, threads, and timers. That doesn't mix well with .NET, and using that BackgroundWorker class and events is how you get around that.
Cylon Cat
Thanks, but still I don't understand why it should listen for events again immediatelly if I click on the task bar icon..
Tarnschaf
A: 

A wild guess here: Could the status message somehow be causing the other app to call your Stop task?

I would put debug stuff at the start of all three methods, see if you're deadlocking on yourself.

Loren Pechtel
Thanks. I'm pretty sure this is not the case. I'm also pretty sure it's not a classical deadlock because of the way the app continues after the right click..
Tarnschaf
+1  A: 

Yes, this is a classic deadlock scenario. The StatusEvent cannot proceed because it needs the UI thread to update the controls. The UI thread is however stuck, trying to acquire the driverLock. Held by the code that calls StatusEvent. Neither thread can proceed.

Two ways to break the lock:

  • the StatusEvent code might not necessarily need to run synchronously. Use BeginInvoke instead of Invoke.
  • the UI thread might not necessarily need to wait for the thread to stop. Your thread could notify it later.

There is not enough context in your snippets to decide which one is better.

Note that you might have a potential race on the timer too, it isn't visible in your snippet. But the callback might run a microsecond after the timer was stopped. Avoid this kind of headache by using a real thread instead of a timer callback. It can do things periodically by calling WaitOne() on a ManualResetEvent, passing a timeout value. That ManualResetEvent is good to signal the thread to stop.

Hans Passant
Thank you, the tip with BeginInvoke looks most promising until now. I will try and inform you. I'm afraid you didn't explain - if this is a deadlock - why it is "solved" by right clicking the task bar entry. Any ideas?
Tarnschaf
No idea, there's not enough code to tell. All I can see is that what's there is likely to deadlock.
Hans Passant
+5  A: 

As for why right-click "unlocks" your application, my "educated guess" of events that lead to this behaviour is as follows:

  1. (when your component was created) GUI registered a subscriber to the status notification event
  2. Your component acquires lock (in a worker thread, not GUI thread), then fires status notification event
  3. The GUI callback for status notification event is called and it starts updating GUI; the updates are causing events to be sent to the event loop
  4. While the update is going on, "Start" button gets clicked
  5. Win32 sends a click message to the GUI thread and tries to handle it synchronously
  6. Handler for the "Start" button gets called, it then calls "Start" method on your component (on GUI thread)
  7. Note that the status update has not finished yet; start button handler "cut in front of" the remaining GUI updates in status update (this actually happens quite a bit in Win32)
  8. "Start" method tries to acquire your component's lock (on GUI thread), blocks
  9. GUI thread is now hung (waits for start handler to finish; start handler waits for lock; the lock is held by worker thread that marshalled a GUI update call to GUI thread and waits for the update call to finish; the GUI update call marshalled from worker thread is waiting for start handler that cut in front of it to finish; ...)
  10. If you now right-click on taskbar, my guess is that taskbar manager (somehow) starts a "sub-event-loop" (much like modal dialogs start their own "sub-event-loops", see Raymond Chen's blog for details) and processes queued events for the application
  11. The extra event loop triggered by the right-click can now process the GUI updates that were marshalled from the worker thread; this unblocks the worker thread; this in turn releases the lock; this in turn unblocks application's GUI thread so it can finish handling start button click (because it can now acquire the lock)

You could test this theory by causing your application to "bite", then breaking into debugger and looking at the stack trace of the worker thread for your component. It should be blocked in some transition to GUI thread. The GUI thread itself should be blocked in the lock statement, but down the stack you should be able to see some "cut in front of the line" calls...

I think the first recommendation to be able to track this issue down would be to turn on the flag Control.CheckForIllegalCrossThreadCalls = true;.

Next, I would recommend firing the notification event outside of the lock. What I usually do is gather information needed by an event inside a lock, then release the lock and use the information I gathered to fire the event. Something along the lines:

string status;
lock (driverLock) {
    if (!active) { return; }
    status = ...
    actionTimer.Change(500, Timeout.Infinite);
}
StatusEvent(this, new StatusEventArgs(status));

But most importantly, I would review who are the intended clients of your component. From the method names and your description I suspect GUI is the only one (it tells you when to start and stop; you tell it when your status changes). In that case you should not be using a lock. Start & stop methods could simply be setting and resetting a manual-reset event to indicate whether your component is active (a semaphore, really).

[update]

In trying to reproduce your scenario I wrote the following simple program. You should be able to copy the code, compile and run it without problems (I built it as a console application that starts a form :-) )

using System;
using System.Threading;
using System.Windows.Forms;

using Timer=System.Threading.Timer;

namespace LockTest
{
    public static class Program
    {
        // Used by component's notification event
        private sealed class MyEventArgs : EventArgs
        {
            public string NotificationText { get; set; }
        }

        // Simple component implementation; fires notification event 500 msecs after previous notification event finished
        private sealed class MyComponent
        {
            public MyComponent()
            {
                this._timer = new Timer(this.Notify, null, -1, -1); // not started yet
            }

            public void Start()
            {
                lock (this._lock)
                {
                    if (!this._active)
                    {
                        this._active = true;
                        this._timer.Change(TimeSpan.FromMilliseconds(500d), TimeSpan.FromMilliseconds(-1d));
                    }
                }
            }

            public void Stop()
            {
                lock (this._lock)
                {
                    this._active = false;
                }
            }

            public event EventHandler<MyEventArgs> Notification;

            private void Notify(object ignore) // this will be invoked invoked in the context of a threadpool worker thread
            {
                lock (this._lock)
                {
                    if (!this._active) { return; }
                    var notification = this.Notification; // make a local copy
                    if (notification != null)
                    {
                        notification(this, new MyEventArgs { NotificationText = "Now is " + DateTime.Now.ToString("o") });
                    }
                    this._timer.Change(TimeSpan.FromMilliseconds(500d), TimeSpan.FromMilliseconds(-1d)); // rinse and repeat
                }
            }

            private bool _active;
            private readonly object _lock = new object();
            private readonly Timer _timer;
        }

        // Simple form to excercise our component
        private sealed class MyForm : Form
        {
            public MyForm()
            {
                this.Text = "UI Lock Demo";
                this.AutoSize = true;
                this.AutoSizeMode = AutoSizeMode.GrowAndShrink;

                var container = new FlowLayoutPanel { FlowDirection = FlowDirection.TopDown, Dock = DockStyle.Fill, AutoSize = true, AutoSizeMode = AutoSizeMode.GrowAndShrink };
                this.Controls.Add(container);
                this._status = new Label { Width = 300, Text = "Ready, press Start" };
                container.Controls.Add(this._status);
                this._component.Notification += this.UpdateStatus;
                var button = new Button { Text = "Start" };
                button.Click += (sender, args) => this._component.Start();
                container.Controls.Add(button);
                button = new Button { Text = "Stop" };
                button.Click += (sender, args) => this._component.Stop();
                container.Controls.Add(button);
            }

            private void UpdateStatus(object sender, MyEventArgs args)
            {
                if (this.InvokeRequired)
                {
                    Thread.Sleep(2000);
                    this.Invoke(new EventHandler<MyEventArgs>(this.UpdateStatus), sender, args);
                }
                else
                {
                    this._status.Text = args.NotificationText;
                }
            }

            private readonly Label _status;
            private readonly MyComponent _component = new MyComponent();
        }

        // Program entry point, runs event loop for the form that excercises out component
        public static void Main(string[] args)
        {
            Control.CheckForIllegalCrossThreadCalls = true;
            Application.EnableVisualStyles();
            using (var form = new MyForm())
            {
                Application.Run(form);
            }
        }
    }
}

As you can see, the code has 3 parts - first, the component that is using timer to call notification method every 500 milliseconds; second, a simple form with label and start/stop buttons; and finally main function to run the even loop.

You can deadlock the application by clicking start button and then within 2 seconds clicking stop button. However, the application is not "unfrozen" when I right-click on taskbar, sigh.

When I break into the deadlocked application, this is what I see when switched to the worker (timer) thread:

Worker thread

And this is what I see when switched to the main thread:

Main thread

I would appreciate if you could try compiling and running this example; if it works the same for you as me, you could try updating the code to be more similar to what you have in your application and perhaps we can reproduce your exact issue. Once we reproduce it in a test application like this, it shouldn't be a problem to refactor it to make the problem go away (we would isolate essence of the problem).

[update 2]

I guess we agree that we can't easily reproduce your behaviour with the example I provided. I'm still pretty sure the deadlock in your scenario is broken by an extra even loop being introduced on right-click and this event loop processing messages pending from the notification callback. However, how this is achieved is beyond me.

That said I would like to make the following recommendation. Could you try these changes in your application and let me know if they solved the deadlock problem? Essentially, you would move ALL component code to worker threads (i.e. nothing that has to do with your component will be running on GUI thread any more except code to delegate to worker threads :-) )...

        public void Start()
        {
            ThreadPool.QueueUserWorkItem(delegate // added
            {
                lock (this._lock)
                {
                    if (!this._active)
                    {
                        this._active = true;
                        this._timer.Change(TimeSpan.FromMilliseconds(500d), TimeSpan.FromMilliseconds(-1d));
                    }
                }
            });
        }

        public void Stop()
        {
            ThreadPool.QueueUserWorkItem(delegate // added
            {
                lock (this._lock)
                {
                    this._active = false;
                }
            });
        }

I moved body of Start and Stop methods into a thread-pool worker thread (much like your timers call your callback regularly in context of a thread-pool worker). This means GUI thread will never own the lock, the lock will only be acquired in context of (probably different for each call) thread-pool worker threads.

Note that with the change above, my sample program doesn't deadlock any more (even with "Invoke" instead of "BeginInvoke").

[update 3]

As per your comment, queueing Start method is not acceptable because it needs to indicate whether the component was able to start. In this case I would recommend treating the "active" flag differently. You would switch to "int" (0 stopped, 1 running)and use "Interlocked" static methods to manipulate it (I assume that your component has more state it exposes - you would guard access to anything other than "active" flag with your lock):

        public bool Start()
        {
            if (0 == Interlocked.CompareExchange(ref this._active, 0, 0)) // will evaluate to true if we're not started; this is a variation on the double-checked locking pattern, without the problems associated with lack of memory barriers (see http://www.cs.umd.edu/~pugh/java/memoryModel/DoubleCheckedLocking.html)
            {
                lock (this._lock) // serialize all Start calls that are invoked on an un-started component from different threads
                {
                    if (this._active == 0) // make sure only the first Start call gets through to actual start, 2nd part of double-checked locking pattern
                    {
                        // run component startup

                        this._timer.Change(TimeSpan.FromMilliseconds(500d), TimeSpan.FromMilliseconds(-1d));
                        Interlocked.Exchange(ref this._active, 1); // now mark the component as successfully started
                    }
                }
            }
            return true;
        }

        public void Stop()
        {
            Interlocked.Exchange(ref this._active, 0);
        }

        private void Notify(object ignore) // this will be invoked invoked in the context of a threadpool worker thread
        {
            if (0 != Interlocked.CompareExchange(ref this._active, 0, 0)) // only handle the timer event in started components (notice the pattern is the same as in Start method except for the return value comparison)
            {
                lock (this._lock) // protect internal state
                {
                    if (this._active != 0)
                    {
                        var notification = this.Notification; // make a local copy
                        if (notification != null)
                        {
                            notification(this, new MyEventArgs { NotificationText = "Now is " + DateTime.Now.ToString("o") });
                        }
                        this._timer.Change(TimeSpan.FromMilliseconds(500d), TimeSpan.FromMilliseconds(-1d)); // rinse and repeat
                    }
                }
            }
        }

        private int _active;
Milan Gardian
Thanks for the exhaustive answer and for noticing the actual question. I didn't understand you fully yet but I will give it a try soon!
Tarnschaf
I hope you will notice that I updated the answer with a program that attempts to reproduce your issue; I can get the application to deadlock, but it won't resume after I right-click the taskbar, so you'll have to try to update the code to resemble your situation and hopefully we can reproduce your behaviour fully...
Milan Gardian
First thank you very much! I tried your app, ported it to .net 2.0, changed some things that could be different, but same behaviour as you experienced: a real deadlock. and the reason is clear.. I contacted the GUI developer for more details. A BeginInvoke instead of Invoke solves the deadlock in this simple case.Perhaps the messages are handled differently when the component is in another DLL / address space?
Tarnschaf
Yes I noticed that there was no deadlock with BeginInvoke but I wanted to come as close as possible to your deadlock scenario :-)
Milan Gardian
So I updated the answer some more - I hope update 2 solves your problem (even though the current behaviour remains unexplained). Could you please comment on whether this works?
Milan Gardian
Yeah, intuitively I would say starting threads for pressed buttons has to be done by the GUI! Problem is the original Start method returns a boolean, so I would have to change to Async calls with callbacks. I think about using your approach for the Stop method anyway, so it is non-blocking and still threadsafe.
Tarnschaf
Yet another update then :-)). In update 3, I switched to using Interlocked to implement a simple double-checked locking to be able to check "active" flag without having to acquire lock.
Milan Gardian
The third update still produces a deadlock if you start and stop , then start and stop in a fast manner.
Mustafa A. Jabbar
A: 

Thank you Tarnschaf and Milan Gardian. I was experiencing a similar problem and your suggestions really helped me out.

Geert