views:

95

answers:

2

I'm having deadlock problems with this piece of code:


def _entropy_split_parallel(data_train, answers_train, weights):
    CPUS = 1 #multiprocessing.cpu_count()
    NUMBER_TASKS = len(data_train[0])
    processes = []

    multi_list = zip(data_train, answers_train, weights)

    task_queue = multiprocessing.Queue()
    done_queue = multiprocessing.Queue()

    for feature_index in xrange(NUMBER_TASKS):
        task_queue.put(feature_index)

    for i in xrange(CPUS):
        process = multiprocessing.Process(target=_worker, 
                args=(multi_list, task_queue, done_queue))
        processes.append(process)
        process.start()

    min_entropy = None
    best_feature = None
    best_split = None
    for i in xrange(NUMBER_TASKS):
        entropy, feature, split = done_queue.get()
        if (entropy < min_entropy or min_entropy == None) and entropy != None:
            best_feature = feature
            best_split = split

    for i in xrange(CPUS):
        task_queue.put('STOP')

    for process in processes:
        process.join()

    return best_feature, best_split


def _worker(multi_list, task_queue, done_queue):
    feature_index = task_queue.get()
    while feature_index != 'STOP':
        result = _entropy_split3(multi_list, feature_index)
        done_queue.put(result)
        feature_index = task_queue.get()

When I run my program, it works fine for several runs through _entropy_split_parallel, but eventually deadlocks. The parent process is blocking on done_queue.get(), and the worker process is blocking on done_queue.put(). Since the queue is always empty when this happens, blocking on get is expected. What I don't understand is why the worker is blocking on put, since the queue is obviously not full (it's empty!). I've tried the block and timeout keyword arguments, but get the same result.

I'm using the multiprocessing backport, since I'm stuck with Python 2.5.


EDIT: It looks like I'm also getting deadlock issues with one of the examples provided with the multiprocessing module. It's the third example from the bottom here. The deadlocking only seems to occur if I call the test method many times. For example, changing the bottom of the script to this:


if __name__ == '__main__':
    freeze_support()
    for x in xrange(1000):
        test()

EDIT: I know this is an old question, but testing shows that this is no longer a problem on windows with Python 2.7. I will try Linux and report back.

A: 

What happens if you use get_nowait() in the parent process?

Russell Borogove
The parent process dies with a Queue.Empty exception, because it is trying to get on an empty queue.
ajduff574
+1  A: 

I think the problem is the parent thread joining a child thread to which it has passed a Queue. This is discussed the the multiprocessing module's programming guidelines section.

At any rate, I encountered the same symptom that you describe, and when I refactored my logic so that the master thread did not join the child threads, there was no deadlock. My refactored logic involved knowing the number of items that I should get from the results or "done" queue (which can be predicted based on either the number of child threads or the number of items on the work queue, etc.), and looping infinitely till all of these were collected.

"Toy" illustration of the logic:

num_items_expected = figure_it_out(work_queue, num_threads)
items_received = []
while len(items_received) < num_items_expected:
    items_received.append(done_queue.get())
    time.sleep(5)

The above logic avoids the need for the parent thread to join the child thread, yet allows the parent thread to block until all the children are done. This approach avoided my deadlock problems.

Jeet
I think all the queues should be empty when the processes are joined, so this shouldn't be a problem. Plus, the master process is deadlocking on put, rather than join. I just upgraded Python (I was stuck with an old version), so I will test this out again.
ajduff574
@ajduff in my case, the deadlock did not happen on the join, but the put as well, except that the put was in the child thread. Also, in my case, the queue that was being put into was empty. So I think it is worth a shot (i.e., avoiding the master thread joining the child threads) in your case as well.
Jeet