views:

166

answers:

2

I've been going through the Java Concurrency in Practice book. It is definitively a great reference. I'm trying to extend the last example of the efficient and scalable result set cache to incorporate soft references. The objects stored in my cache can grow rather large and I need some way for those objects to get garbage collected when memory is low hence the soft references. Getting the thread safe and soft reference concepts to work has tested my limits and I need help. So far this is what I have working. I'm not sure if this is really thread safe. Could anyone please take a look at it and comment?

public class RelationCollectionCache {

    // properties

    private final ConcurrentMap<Integer, Reference<Future<RelationCollection>>> cache =
        new ConcurrentHashMap<Integer, Reference<Future<RelationCollection>>>();

    private final ReferenceQueue<? super Future<RelationCollection>> queue = 
        new ReferenceQueue<Future<RelationCollection>>();

    // constructors

    public RelationCollectionCache() {
    }

    // methods

    public RelationCollection load(final Integer id) {

        Reference<Future<RelationCollection>> reference = cache.get(id);
        Future<RelationCollection> future = (reference == null) ? null : reference.get();

        if (future == null) {

            Callable<RelationCollection> eval = new Callable<RelationCollection>() {
                public RelationCollection call() throws Exception {
                    return compute(id);
                }
            };

            FutureTask<RelationCollection> task = new FutureTask<RelationCollection>(eval);
            reference = cache.putIfAbsent(id, new InternalSoftReference(id, task));
            if (((reference == null) ? null : reference.get()) == null) {
                future = task;
                task.run();
            }

        }

        try {
            return future.get();
        } catch (ExecutionException e) {
            e.printStackTrace();
        } catch (InterruptedException e) {
            e.printStackTrace();
        }

        return null;

    }

    private RelationCollection compute(Integer id) {

        RelationCollection collection = new RelationCollection();

        // lengthy computation of a large collection

        return collection;

    }

    public RelationCollection get(Integer id) {

        clean();

        Reference<Future<RelationCollection>> reference = cache.get(id);
        Future<RelationCollection> future = (reference == null) ? null : reference.get();
        if (future == null)
            return null;

        try {
            return future.get();
        } catch (ExecutionException e) {
            e.printStackTrace();
        } catch (InterruptedException e) {
            e.printStackTrace();
        }

        return null;

    }

    public void remove(Integer id) {

        clean();

        cache.remove(id);

    }

    public void clean() {

        InternalSoftReference reference = (InternalSoftReference) queue.poll();
        while (reference != null) {
            cache.remove(reference.id);
            reference = (InternalSoftReference) queue.poll();
        }

    }

    // internal classes

    private class InternalSoftReference extends SoftReference<Future<RelationCollection>> {

        private Integer id;

        public InternalSoftReference(Integer id, Future<RelationCollection> future) {
            super(future, queue);
            this.id = id;
        }

    }

}

This implementation calls the clean method at every operation to get rid of garbage collected references. This could also be a Thread but that incorporates another level of concurrency which I haven't even explored.

+1  A: 

Here's my pass at it. I haven't tested this code, but I think it catches most of the cases

public class RelationCollectionCache {

    // properties
    private final ConcurrentMap<Integer, Reference<Future<RelationCollection>>> cache =
        new ConcurrentHashMap<Integer, Reference<Future<RelationCollection>>>();

    // constructors
    public RelationCollectionCache() {
    }

    // methods
    public RelationCollection load(final Integer id) {

        Reference<Future<RelationCollection>> reference = cache.get(id);
        Future<RelationCollection> future = (reference == null) ? null : reference.get();

        while (future == null) {
            final Callable<RelationCollection> eval = new Callable<RelationCollection>() {
                public RelationCollection call() throws Exception {
                    return compute(id);
                }
            };

            final FutureTask<RelationCollection> task = new FutureTask<RelationCollection>(eval);
            final SoftReference<Future<RelationCollection>> newReference =
                    new SoftReference<Future<RelationCollection>>(task);
            // Need to use replace as we may have an expired reference in the cache.
            if (cache.replace(id, reference, newReference)) {
                task.run();
                future = task;
            } else {
                // Another thread may have done the replace
                reference = cache.get(id);
                future = (reference != null) ? reference.get() : null;
            }
        }

        return getFromFuture(future);
    }

    private RelationCollection compute(final Integer id) {
        final RelationCollection collection = new RelationCollection();
        // lengthy computation of a large collection
        return collection;
    }

    public RelationCollection get(final Integer id) {

        Reference<Future<RelationCollection>> reference = cache.get(id);

        if (reference == null) {
            return null;
        }

        Future<RelationCollection> future = reference.get();

        if (future == null) {
            // Clean up the expired reference
            while (!cache.remove(id, reference)) {
                reference = cache.get(id);
                future = (reference == null) ? null : reference.get();
                // Its possible another thread may have replaced the
                // expired reference with a valid in the mean time.
                if (future != null) {
                    return getFromFuture(future);
                }
            }
            return null;
        }

        return getFromFuture(future);
    }

    private static RelationCollection getFromFuture(final Future<RelationCollection> future) {
        try {
            return future.get();
        } catch (final ExecutionException e) {
            e.printStackTrace();
        } catch (final InterruptedException e) {
            e.printStackTrace();
        }

        return null;
    }

    public void remove(final Integer id) {
        cache.remove(id);
    }
}

I removed the clean method as there are a lot of cases where it won't work. An object will not always transition to being softly referenced. It is possible for an object to from strongly referenced directly to phantom referenced and therefore will not appear in the reference queue. I embedded the clean up as part of load and get operations.

Michael Barker
Won't this implementation leak stale values? The only way an entry might be removed is if someone tries to `get` it.
McDowell
Yes, and so will the original. The most common way to prevent leakage with most caching implementations is to support an additional eviction policy, e.g. time based and have a separate thread to do the clean up. NB - it will clean up on both get and load.
Michael Barker
Thanks for your through answer. I believe this does not properly release garbage collected items, at least not until one tries to get it, but we could deal with that in different ways. When the garbage collector does a pass it nullifies the referenced object thus leaving an entry in the cache <Integer, Reference> in which the Reference.get() will return null. This is why I was registering the garbage collected items in the reference queue to remove them from cache. I could certainly do this with another Thread.
rmarimon
What would be the correct way to test this. I find hard to grasp this code and since it is going to be an important building block of a very large system I need to be sure there are not going to be any surprises.
rmarimon
If this is going to be part of an important system, then I would recommend Keven B.'s answer below, have a look at MapMaker from google collections. It will probably do most of what you need, or otherwise look an existing cache solution that will handle eviction of old objects.Testing of this is very hard, as well as solid unit tests, you need a lot of peer review from someone who is well versed in concurrent programming (its easy to get wrong).
Michael Barker
+1  A: 

This is extremely hard. Can you just use MapMaker from google-collections?

Kevin Bourrillion
I will give MapMaker a try.
rmarimon