views:

65

answers:

2

I am enhancing an existing algorithm that consists of multiple independent steps to use concurrent tasks. Each of the tasks will create multiple objects to hold its results. In the end, I would like to have a list of all the results to return from the controlling method. At the moment, my code looks something like that

private final ExecutorService pool = ...;

// A single task to be performed concurrently with other tasks.
private class WorkHorse implements Callable<Void> {
    private final Collection<X> collect;

    public WorkHorse(Collection<X> collect, ...) {
        this.collect = collect;
    }

    public Void call() {
        for (...) {
            // do work

            synchronized (this.collect) {
                this.collect.add(result);
            }
        }
        return null;
    }
}

// Uses multiple concurrent tasks to compute its result list.
public Collection<X> getResults() {
    // this list is supposed to hold the results
    final Collection<X> collect = new LinkedList<X>();

    final List<WorkHorse> tasks = Arrays.asList(  
        new WorkHorse(collect, ...), new WorkHorse(collect, ...), ...);
    this.pool.invokeAll(tasks);

    // ## A ##
    synchronized (collect) {
        return collect;
    }
}

Do I actually need the synchronized at "## A ##" to enforce a happens-before relationship with the modifying operations in the worker tasks? Or can I rely on all write operations to have happened after invokeAll returns and be visible to the controlling thread? And is there any reason, why I should not return the results collection from within its own synchronized block?

+2  A: 

No, you don't need that. The documentation of invokeAll states that all jobs should be done when it returns. So there should be no further access to collect when you reach the return statement.

Zed
Also, if these were *not* the semantics of invokeAll(), the synchronized block wouldn't help. You would seem to want some kind of gate that would prevent continuation until all the other threads were done writing.
Avi
Brian Goetz says in "Java Concurrency in Action" that "everything [thread] A did in or prior to a synchronized block is visible to [thread] B when it executes a synchronized block guarded by the same lock. Without synchronization, there is no such guarantee." The statement in the documentation of invokeAll doesn't seem to waive this requirement, or does it?
janko
@janko That statement is true, but there would be no guarantee that your synchronized return block is not run before all the worker threads reaching the synchronized part.invokeAll says that the threads had already been done all their work (i.e. reached return null;) when it returns.
Zed
@Zed: I accept that all threads have run and completed when the control method's synchronized block is reached. However, I am not yet convinced, that all the modifications of the worker tasks are guaranteed to be visible to the control method without the synchronized block.
janko
@janko: Wouldn't this also imply that because you did not put synchronization on "collect = new LinkedList<X>()", some of your threads might see collect as uninitialized and therefore report a null pointer exception?
Zed
@Zed: No, because the write "collect = new LinkedList<X>()" happens-before the construction of the WorkHorses and therefore is guaranteed to be visible to them. For the intricacies of memory visibility see, e.g., http://java.sun.com/javase/6/docs/api/java/util/concurrent/package-summary.html#MemoryVisibility ... However, the extended guarantees listed at the linked page do make my last synchronized block superfluous, I guess
janko
Okay, but creating WorkHorses objects have nothing to do with the starting of the threads in the ExecutorService.
Zed
A: 

You don't need the second synchronized if you have the first one in there. As Zed notes, invokeAll() will block until all tasks have completed. Meanwhile, the synchronization around add() will ensure that the changes to the collection are visible to all threads, including the original calling thread.

As for whether you need the first one (which you didn't ask) -- I tried removing both synchronized blocks and couldn't actually get it to fail, but having it in there is probably the safer bet. According to the javadoc for LinkedList:

If multiple threads access a LinkedList concurrently, and at least one of the threads modifies the list structurally, it must be synchronized externally.

The other "2nd-generation" Collection implementations have similar warnings.

Note by the way that there's nothing magic about synchronizing on the collection itself. You could declare a separate mutex (any old Object would do) in the outer class, or synchronize on the outer class instance, and that would work just as well, so long as all WorkHorses synchronize on the same thing.

David Moles