views:

128

answers:

6

Hi all,

I am having troubles with implementing a simple work queue. Doing some analysis, I am facing a subtle problem. The work queue is backed by a regular linked list. The code looks like this (simplified):

0. while (true)  
    1. while (enabled == true)  
        2. acquire lock on the list and get the next action to be executed (blocking operation)  (store it in a local variable)  
        3. execute the action (outside the lock on the list on previous line)
    4. get lock on this work queue  
        5. wait until this work queue has been notified (triggered when setEnabled(true) has been callled)

The setEnabled(e) operation looks like this (simplified):

enabled = e
if (enabled == true)
    acquire lock on this work queue and do notify()

Although this works, there is a condition in which a deadlock occurs. It happens in the following rare situation:

  • while an action is being executed during step (3), setEnabled(false) is called
  • just before step (4) is entered, setEnabled(true) is called
  • now step (5) keeps waiting forever, because this work queue has already been notified but we missed it

How do I solve this? I have been looking at this for some time, but I cannot come up with a solution.

Please note I am fairly new to thread synchronization.

Thanks a lot.

A: 

If you think that the problem is a missed notification, then just store another boolean flag, such as "wasNotified". Then at step 5, do a synchronized check of wasNotified. If you were notified, then repeat step 1, otherwise wait for another notification.

You might also look into using a condition variable to implement this (although the Condition class might not be available to you if you are limited to 1.1).

YGL
+1  A: 

Not sure how the multithreaded memory model is on mobile Java. For Desktop Java I am pretty sure it had some serious bugs all the way up to Java 1.5.

Would be easier to troubleshoot with real Java code over psuedocode.. but I wouldn't be surprised if this was a Java 1.1 bug and not a code bug...

bwawok
A: 

I believe that the following 2 changes will fix the deadlock bug:

1) change setEnabled as follows:

setEnabled(e){
    acquire lock on Q{
        enabled = e 
        if (enabled == true) 
             do Q.notify() 
    }
}

2) Add condition to step 5:

if enabled=false, wait until this work queue has been notified (Q.wait())

This approach guarantees that the wait of step 5 will occur only if the enabled flag is currently down.

Eyal Schneider
I understand the second step. But why did you put the assignment 'enabled = e' into the lock? I cannot find a reason why this is necessary.
John Deerikio
@John: it is not necessary for the deadlock situation, but it synchronizes the access to 'enabled' flag properly. If multiple threads read/write a variable, the accesses should be either synchronized on the same lock, or the variable should be declared as volatile. Otherwise, thread visibility issues may arise.
Eyal Schneider
A: 

You should make sure that you check and wait for conditions in a loop, otherwise you may cancel the wait prematurely and that can lead to a deadlock later on:

change this:

if ( ! condition )
{
    wait( );
}

to this:

while ( ! condition )
{
    wait( );
}

This comes straight from the documentation for wait.

Alexander Pogrebnyak
Thanks for the pointer, I still do not understand why, but I'll do some research to find out.
John Deerikio
A: 

You are trying to create a worker which can be enabled/disabled which is different from a worker which processes tasks when available. You need to synchronize around the enabled state, not the queue.

while (true)  
    synchronize (enabled)
        while (!enabled)
            enabled.wait();
    task = taskList.take();
    task.run();

You can reduce the synchronization of enabled (if you are willing to accept visibility problems) by first doing an if(!enabled) check prior to the synchronize block. Basically, the broken paradigm known as double check locking.

Tim Bender
I understand why I have to synchronize on the 'enabled' instead of the 'queue', but is this a big error? Isn't think 'as long as I synchronize on something, it's correct' flawed thinking?
John Deerikio
I think this solution is not completely correct. Say the queue list is empty, and enabled is set to true. This means that the function taskList.take() is blocking. While this thread is waiting, setEnabled(false) is called. Now, once a task is added, the task is executed anyway, as the enabled flag is ignored.
John Deerikio
@John, your implementation in the original post suffers from the same flaw. So I assumed it was acceptable/intended that the worker execute at least one task once enabled. Also, yes, you could sync around anything, but using the queue to sync will mean that enabling/disabling has to wait for add/get operations on your list.
Tim Bender
A: 

Here is my code so far based on your replies. This class is an innner class of a public class called WorkQueue and holds the main logic.

protected class Worker extends Thread {
    public void run() {
        while (true) {
            for (Runnable r; isEnabled;) { /* [1] */
                synchronized (queue) {
                    while (queue.isEmpty()) {
                        try {
                            queue.wait();
                        } catch (InterruptedException e) {
                        }
                    }
                    if (!isEnabled) { /* [2] - isEnabled has been set to false in the meantime, do not continue (fail-fast behavior) */
                        break;
                    }
                    r = (Runnable) queue.removeFirst();
                }
                r.run();
            }
            synchronized (this) {
                while (!isEnabled) { /* [4] */
                    try {
                        wait();
                    } catch (InterruptedException e) {
                    }
                }
            }
        }
    }

    protected boolean isEnabled = true;

    protected void setEnabled(final boolean b) {
        synchronized (this) {
            if (isEnabled = b) { /* [3] */
                notify();
            }
        }
    }
}

My main concern is if the 'isEnabled == true' check in [1] and [2] (denoted in the code) also needs to be synchronized?

I also still do not understand why the assignment 'isEnabled = e' in [3] has to be inside the lock and why [4] has to be a while instead of an if. I'll do some more research.

Thanks a lot all.

John Deerikio