views:

107

answers:

6

Hi, in trying to improve my understanding on concurrency issues, I am looking at the following scenario (Edit: I've changed the example from List to Runtime, which is closer to what I am trying):

public class Example {
    private final Object lock = new Object();
    private final Runtime runtime = Runtime.getRuntime();
    public void add(Object o) { 
        synchronized (lock) { runtime.exec(program + " -add "+o); } 
    }
    public Object[] getAll() { 
        synchronized (lock) { return runtime.exec(program + " -list "); }
    }
    public void remove(Object o) { 
        synchronized (lock) { runtime.exec(program + " -remove "+o); } 
    }
}

As it stands, each method is by thread safe when used standalone. Now, what I'm trying to figure out is how to handle where the calling class wishes to call:

for (Object o : example.getAll()) {
    // problems if multiple threads perform this operation concurrently
    example.remove(b); 
}

But as noted, there is no guarantee that the state will be consistent between the call to getAll() and the calls to remove(). If multiple threads call this, I'll be in trouble. So my question is - How should I enable the developer to perform the operation in a thread safe manner? Ideally I wish to enforce the thread safety in a way that makes it difficult for the developer to avoid/miss, but at the same time not complicated to achieve. I can think of three options so far:

A: Make the lock 'this', so the synchronization object is accessible to calling code, which can then wrap the code blocks. Drawback: Hard to enforce at compile time:

synchronized (example) {
    for (Object o : example.getAll()) {
        example.remove(b);
    }
}

B: Place the combined code into the Example class - and benefit from being able to optimize the implementation, as in this case. Drawback: Pain to add extensions, and potential mixing unrelated logic:

public class Example {
   ...
   public void removeAll() { 
       synchronized (lock) { Runtime.exec(program + " -clear"); } 
   }
}

C: Provide a Closure class. Drawback: Excess code, potentially too generous of a synchronization block, could in fact make deadlocks easier:

public interface ExampleClosure {
    public void execute(Example example);
}
public Class Example {
    ...
    public void execute(ExampleClosure closure) { 
        synchronized (this) { closure.execute(this); } 
    }
}

example.execute(new ExampleClosure() {
        public void execute(Example example) { 
            for (Object o : example.getAll()) {
                example.remove(b);
            }
        }
    }
);

Is there something I'm missing? How should synchronization be scoped to ensure the code is thread safe?

+2  A: 

I think j.u.c.CopyOnWriteArrayList is a good example of similar problem you're trying to solve.

JDK had a similar problem with Lists - there were various ways to synchronize on arbitrary methods, but no synchronization on multiple invocations (and that's understandable).

So CopyOnWriteArrayList actually implements the same interface but has a very special contract, and whoever calls it, is aware of it.

Similar with your solution - you should probably implement List (or whatever interface this is) and at the same time define special contracts for existing/new methods. For example, getAll's consistency is not guaranteed, and calls to .remove do not fail if o is null, or isn't inside the list, etc. If users want both combined and safe/consistent options - this class of yours would provide a special method that does exactly that (e.g. safeDeleteAll), leaving other methods close to original contract as possible.

So to answer your question - I would pick option B, but would also implement interface your original object is implementing.

mindas
+1  A: 

From the Javadoc for List.toArray():

The returned array will be "safe" in that no references to it are maintained by this list. (In other words, this method must allocate a new array even if this list is backed by an array). The caller is thus free to modify the returned array.

Maybe I don't understand what you're trying to accomplish. Do you want the Object[] array to always be in-sync with the current state of the List? In order to achieve that, I think you would have to synchronize on the Example instance itself and hold the lock until your thread is done with its method call AND any Object[] array it is currently using. Otherwise, how will you ever know if the original List has been modified by another thread?

Jim Tough
Sorry, the example was a bit unclear. I've removed the reference to List. I'm just trying to figure out the best way to deal with: Blah result = example.method1(); example.method2(result); where the call to method2 may cause problems because inbetween method1 and method2, another thread went and called method3().
Mike
+2  A: 

Use a ReentrantReadWriteLock which is exposed via the API. That way, if someone needs to synchronize several API calls, they can acquire a lock outside of the method calls.

Aaron Digulla
Isn't that the same as option A, using 'this' as the synchronization monitor? I do see the benefit of your suggestion by making it harder to accidentally synchronize on it without thinking. In which case, would: public Object getLock() { return ...; } make more sense?
Mike
That might also work since synchronized locks are also reentrant (same thread can get the same lock twice without blocking). My approach has the benefit that several threads can call `get()` at the same time without blocking.
Aaron Digulla
+1  A: 

I think everyone is missing his real problem. When iterating over the new array of Object's and trying to remove one at a time the problem is still technically unsafe (though ArrayList implantation would not explode, it just wouldnt have expected results).

Even with CopyOnWriteArrayList there is the possibility that there is an out of date read on the current list to when you are trying to remove.

The two suggestions you offered are fine (A and B). My general suggestion is B. Making a collection thread-safe is very difficult. A good way to do it is to give the client as little functionality as possible (within reason). So offering the removeAll method and removing the getAll method would suffice.

Now you can at the same time say, 'well I want to keep the API the way it is and let the client worry about additional thread-safety'. If thats the case, document thread-safety. Document the fact that a 'lookup and modify' action is both non atomic and non thread-safe.

Today's concurrent list implementations are all thread safe for the single functions that are offered (get, remove add are all thread safe). Compound functions are not though and the best that could be done is documenting how to make them thread safe.

John V.
+1  A: 

You have to use the appropriate granularity when you choose what to lock. What you're complaining about in your example is too low a level of granularity, where the lock doesn't cover all the methods that have to happen together. You need to make methods that combine all the actions that need to happen together within the same lock.

Locks are reentrant so the high-level method can call low-level synchronized methods without a problem.

Nathan Hughes
It does seem a sensible approach, where I have difficulty is delineating methods whose business logic is best centered somewhere else: Where the multiple actions need to happen within the same lock, but are more complicated, e.g. not all the items are remove(), perhaps only ones that match a Predicate are? Do I think put in a 'public void remove(Predicate p)' method in the Example class, or do I allow the lock to be accessed by the calling code?
Mike
+2  A: 

In general, this is a classic multithreaded design issue. By synchronizing the data structure rather than synchronizing concepts that use the data structure, it's hard to avoid the fact that you essentially have a reference to the data structure without a lock.

I would recommend that locks not be done so close to the data structure. But it's a popular option.

A potential technique to make this style work is to use an editing tree-walker. Essentially, you expose a function that does a callback on each element.

// pointer to function:
//      - takes Object by reference and can be safely altered 
//      - if returns true, Object will be removed from list

typedef bool (*callback_function)(Object *o);

public void editAll(callback_function func) {  
    synchronized (lock) {
          for each element o { if (callback_function(o)) {remove o} } }  
} 

So then your loop becomes:

bool my_function(Object *o) {
 ...
     if (some condition) return true;
}

...
   editAll(my_function);
...

The company I work for (corensic) has test cases extracted from real bugs to verify that Jinx is finding the concurrency errors properly. This type of low level data structure locking without higher level synchronization is pretty common pattern. The tree editing callback seems to be a popular fix for this race condition.

Dave Dunn