views:

127

answers:

4

I've been looking at the problem of writing a concurrent Multimap, and I have an implementation backed by the Google Guava AbstractSetMultimap and a MapMaker computing map that creates on demand the values-collections as a set view over a ConcurrentHashMap. With some care over the view collections and various wrappers, I think this gets pretty close.

The big problem, which has already been discussed by others who have tried this, appears to be that of removing the values-collections from the underlying map when they become empty, without introducing race conditions.

A couple of options appear to exist.

  • leave the empty collections there. This will leak some CHMs, but I believe it to be at least correct.
  • try optimistically to remove the collection when empty and compensate if anything else appears in it. This is full of races and seems intrinsically impossible to fix.
  • synchronise everything on the values-collection, which would at least allow this removal, but at the cost of any concurrency after the initial lookup by key.
  • for a smaller penalty (perhaps, depending on usage patterns?), perhaps synchronise on values-collection creation and removal, need to check whether that covers everything.

Questions:

  • Does anyone know of any better implementation than this? Can we do better composing bits of MapMaker, or does this need a specialised ConcurrentHashMultimap written from scratch?
  • If it's difficult to improve much on this, is this leak likely to be much of an issue in practice? Notable collections such as java.util.HashMap, juc.ConcurrentHashMap and ArrayDeque do not resize the backing store downwards, and ArrayList does not do so automatically. As long as we clear out the objects, I wonder whether this will matter too much.

Thanks


Edit: see also the discussion here on the guava mailing list.

A: 

If you really don't want to leak the empty collection you could try to atomically replace it with a per-key placeholder Future. That way concurrent add/remove or add/add should be able to reach consistent state when expanding again.

Holger Hoffstätte
That may work, but will leak the placeholder value and pin the key in memory. Granted, the placeholder is likely to be smaller than the empty collection, so it's an improvement.
Joe Kearney
A: 

Using immutable collections as the values is the best way to solve/simplify the basic concurrency issues as you can then remove with the atomic replace method. Unfortunately, there aren't immutable collections with fast copy/update profiles in general use, so you usually need to do quite expensive copy everything.

Jed Wesley-Smith
Immutable values-collections certainly works here in terms of safety, but yeah all the copying around will kill this in the useful case where there's lots of contention.
Joe Kearney
In my experience, there needs to be a lot of contention before this actually becomes a problem, and for medium levels of contention this is actually a clear win – particularly where reads dominate.
Jed Wesley-Smith
A: 

As a follow-up, here are some details I omitted from the earlier discussions, which you linked to, about my concurrent multimap implementation.

That implementation followed your first suggestion: leaving empty collections in the backing map. The following life-view behavior complicates your other suggestions.

Multimap<String, Integer> multimap = HashMultimap.create();
Set<Integer> set = multimap.get("foo");
multimap.put("foo", 1);
int count = set.size();   // equals 1

For a real-world application, as opposed to a collection library class, something less than a fully concurrent multimap would probably suffice. You could define your own class that implements a subset of the Multimap interface or a limited selection of concurrency guarantees. Or, your application logic could minimize contention of a synchronized multimap to avoid performance problems.

Jared Levy
The way I solved that live-view behaviour was that `multimap.get("foo")` returns a `ForwardingSet` delegating to the real one. So every operation on that set looks up the underlier, which can change between any two calls. This indirection handles most of the issues there, I think, but causes a spurious values-collection to be created on every operation on a non-present key. Thanks for you comments.
Joe Kearney
+1  A: 

I asked the same question earlier, and ended up implementing 4 different implementations.

The question: http://stackoverflow.com/questions/3635292/high-performance-concurrent-multimap-java-scala

The impl (I call it Index) http://github.com/jboner/akka/blob/master/akka-actor/src/main/scala/actor/ActorRegistry.scala#L314

Viktor Klang