views:

243

answers:

10

In our system, we have a method that will do some work when it's called with a certain ID:

public void doWork(long id) { /* ... */ }

Now, this work can be done concurrently for different IDs, but if the method is called with the same ID by 2 threads, one thread should block until it's finished.

The simplest solution would be to have a Map that maps from the Long ID to some arbitrary object that we can lock on. One problem I foresee with this is that we can have tons of IDs in the system and this map will keep growing every day.

Ideally, I think we need a system where we each thread will fetch a lock object, lock when possible, do the work, then signal that we're done with the lock. If it's clear that nobody else is using this particular lock, then safely remove it from the lock map to prevent the memory leak.

I imagine this must be a pretty common scenario so I'm hoping there is an existing solution out there. Anyone know of any?

A: 

Wouldn't it be enough to use a SynchronizedHashMap or Collections.synchronizedMap(Map m) from the java.util.concurrent package instead of a plain HashMap where calls for retrieving and inserting are not synchronized?

something like:

Map<Long,Object> myMap = new HashMap<Long,Object>();
Map<Long,Object> mySyncedMap=Collections.synchronizedMap(myMap);
smeg4brains
+4  A: 

I'd say you're already pretty far to having a solution. Make a LockManager who lazily and reference-counted-ly manages those locks for you. Then use it in doWork:

public void doWork(long id) {
    LockObject lock = lockManager.GetMonitor(id);
    try {
        synchronized(lock) {
            // ...
        }
    } finally {
        lock.Release();
    }
}
David Schmitt
I started fixing the code, but then - it's too not-Java to fix.. (Object doesn't have 'release')
Bozho
Yeah, this is pretty similar to what I have now, but I don't trust myself enough to do this right. The 2 missing pieces here are the implementations of the Lock and LockManager classes.
Outlaw Programmer
To me it sounds like a viable solution. Replace the `Object` with some other custom class.
Adeel Ansari
@Outlaw, See the java.util.concurrent.locks package.
Kevin
@Bozho: thanks for trying anyways :-) Since its not real code I just called it "LockObject". There, I fixed it! (http://thereifixedit.com/)
David Schmitt
A: 

Premature optimization is the root of evil

Try it with a (synchronized) map.

Maybe if it grows too big, you can clear its content at regular intervals.

chburd
I wouldn't call it premature. We know for a fact that thousands and thousands of unique IDs are used every day. I don't want to flush the map because it's quite likely at least 1 lock will be in use. I'm thinking of using a map where the oldest entry is removed when the map is full but even then there's no guarantee that lock is not in use.
Outlaw Programmer
A: 

Hi,

I suggest that you use the utilities from java.util.concurrent, especially the class AtomicLong. See related javadoc

Jean-Philippe Caruana
I don't think there's 1 particular class there that will help me. I'm looking for a strategy on how to use those pieces together.
Outlaw Programmer
Please read the page : it will help understand how to use it
Jean-Philippe Caruana
@Jean-Philippe, he doesn't need an atomic long. He needs a multitude of different locks.
Mike Daniels
+4  A: 

To start with:

  1. You can't lock on a primitive and
  2. Don't lock on a Long unless you're careful how you construct them. Long values created by autoboxing or Long.valueOf() in a certain range are guaranteed to be the same across the JVM which means other threads could be locking on the same exact Long object and giving you cross-talk. This can be a subtle concurrency bug (similar to locking on intern'ed strings).

You're talking here about a lock-striping setup. One end of the continuum is a single giant lock for all ids which will is easy and safe but not concurrent. The other end is a lock per id which is easy (to some degree) and safe and very concurrent but might require a large number of "lock-able objects" in memory (if you don't already have them). Somewhere in the middle is the idea of creating a lock for a range of ids - this lets you adjust concurrency based on your environment and make choices about tradeoffs between memory and concurrency.

ConcurrentHashMap can be used to achieve this as CHM is made up internally of segments (sub-maps) and there is one lock per segment. This gives you concurrency equal to the number of segments (which defaults to 16 but is configurable).

There are a bunch of other possible solutions for partitioning your ID space and creating sets of locks but you are right to be sensitive to the clean up and memory leak issues - taking care of that while maintaining concurrency is a tricky business. You'll need to use some kind of reference counting on each lock and manage the eviction of old locks carefully to avoid evicting a lock that's in the process of being locked. If you go this route, use ReentrantLock or ReentrantReadWriteLock (and not synchronized on objects) as that lets you explicitly manage the lock as an object and use the extra methods available on it.

There is also some stuff on this and a StripedMap example in Java Concurrency in Practice section 11.4.3.

Alex Miller
Heh, I was hoping you'd show up. Is there no better way than mapping my ID range to a fixed number of locks? Ideally, each ID would have its own lock, but these locks be removed when I'm sure that they are not needed anymore. This way I can get maximum concurrency while making sure memory usage doesn't grow indefinitely.
Outlaw Programmer
This is actually exactly what Ehcache with Terracotta does internally (with clustered locks). I can tell you that balancing the constraints of concurrency correctness, performance, and memory usage is a hard problem. I wish I had a perfect solution in the Ehcache space to point at here but I don't (yet) see one, mostly because the internal locks are not exposed in the Ehcache API -> that may happen in the future.
Alex Miller
`ConcurrentHashMap` may use fewer locks, but you do not need to hold one for the entire duration of the `doWork(id)` call.
Matthew
@Matthew - That is a very fair point and is exactly the problem I was describing with Ehcache actually. Take a look at the StripedMap though - I'm going from memory but I think it could be adapted for this.
Alex Miller
+4  A: 

You can try the following little 'hack'

String str = UNIQUE_METHOD_PREFIX + Long.toString(id);
synchornized(str.intern()) { .. }

which is 100% guaranteed to return the same instance.

The UNIQUE_METHOD_PREFIX, may be a hardcoded constant, or may be obtained using:

StackTraceElement ste = Thread.currentThread().getStackTrace()[0];
String uniquePrefix = ste.getDeclaringClass() + ":" +ste.getMethodName();

which will guarantee that the lock happens only on this precise method. That's in order to avoid deadlocks.

Bozho
Good idea. Nice hack.
Adeel Ansari
I was thinking of this but doesn't this just push the memory leak into either Long's or String's internal cache?
Outlaw Programmer
Bad hack, Long.valueOf may return a different object each time, and using string.intern() wastes permspace, and any other code using Strings as lock may cause deadlocks!
josefx
updated - Long.valueOf returns cached instances only in the byte range.
Bozho
@josefx - if a unique method prefix us used (updated), then a deadlock can't occur.
Bozho
@josefx: For the `Long.valueOf()`, you can't just cast a negative, because the answer provided a detail explanation and the caveat. And regarding the `String.intern()`, I think its the opposite. It will save on the space, but do bad from the performance point of view.
Adeel Ansari
okay, free of deadlocks. I still don't like the use of string.intern(), but thats just me.
josefx
I've benchmarked its usage recently and it doesn't "eat" up memory, and isn't that expensive :)
Bozho
@Bozho: But still sounds like a work around. Do you not agree?
Adeel Ansari
certainly :) but on the other hand, it works and is predictable
Bozho
+7  A: 

I invented a thing like that for myself some time ago. I call it an equivalence-class lock, meaning, it locks on all of the things that are equal to the given thing. You can get it from my github, and use it subject to the Apache 2 license, if you like, or just read it and forget it!

Jonathan Feinberg
+1 for providing a free implementation.
David Schmitt
And worth every penny. ;)
Jonathan Feinberg
A: 

You could create a list or set of active ids and use wait and notify:

List<Long> working;
public void doWork(long id) {
synchronized(working)
{
   while(working.contains(id))
   {
      working.wait();
   }
   working.add(id)//lock
}
//do something
synchronized(working)
{
    working.remove(id);//unlock
    working.notifyAll();
}
}

Problems solved:

  • Only threads with same id's wait, all others are concurrent
  • No memory Leak as the "locks" (Long) will be removed on unlock
  • Works with autoboxing

Problems there:

  • while/notifyAll may cause some performance loss with high number of threads
  • Not reentrant
josefx
A: 

This is where I would use a canonicalizing map, which takes your long input and returns a canonical Long object that you can then use to synchronize. I've written about canonicalizing maps here; simply replace String by Long (and to make your life easier, let it take a long as a parameter).

Once you have the canonicalizing map, you'd write your lock-guarded code like this:

Long lockObject = canonMap.get(id);
synchronized (lockObject)
{
    // stuff
}

The canonicalizing map would ensure that the same lockObject is returned for the same ID. When there are no active references to lockObject, they will be eligible for garbage collection, so you won't fill up memory with needless objects.

kdgregory
+3  A: 

You can try something with a ReentrantLock, such that you have a Map<Long,Lock>. Now after lock.release() You can test lock.hasQueuedThreads(). If that returns false you can remove it from the Map.

John V.