views:

462

answers:

5

Hello
I have program consisting of a number of classes. I have a problem with the interraction of two of the classes - WebDataCache and Client. The problem classes are listed below.

WebData:
This is just a data class representing some data retrieved from the internet.
WebService:
This is just a web service wrapper class which connects to a particular web service, reads some data and stores it in an object of type WebData.
WebDataCache:
This is a class which uses the WebService class to retreive data that's cached in a map, keyed by the ids fields of the data.
Client:
This is is a class which contains a refrence to an instance of the WebDataCache class and uses the cached data.

The problem is (as illustrated below) when the class is looping through the cached data, it is possible for the WebDataCache to update the underlying collection.

My question is how do I synchronize access to the cache?

I don't want to synchronize the whole cache as there are multiple instance of the Client class, however each instantiated with a unique id (i.e. new Client(0,...), new Client(1,...), new Client(2,...), etc each instance only interested in data keyed by the id the client was instansiated with.

Are there any relevent design patterns I can use?

class WebData {
    private final int id;
    private final long id2;

    public WebData(int id, long id2) {
        this.id = id;
        this.id2 = id2;
    }

    public int getId() { return this.id; }

    public long getId2() { return this.id2; }
}

class WebService {
    Collection<WebData> getData(int id) {
        Collection<WebData> a = new ArrayList<WebData>();
        // populate A with data from a webservice
        return a;
    }
}

class WebDataCache implements Runnable {
    private Map<Integer, Map<Long, WebData>> cache =
        new HashMap<Integer, Map<Long, WebData>>();
    private Collection<Integer> requests =
        new ArrayList<Integer>();

    @Override
    public void run() {
        WebService webSvc = new WebService();
        // get data from some web service
        while(true) {
            for (int id : requests) {
                Collection<WebData> webData = webSvc.getData(id);
                Map<Long, WebData> row = cache.get(id);

                if (row == null)
                    row = cache.put(id, new HashMap<Long, WebData>());
                else
                    row.clear();

                for (WebData webDataItem : webData) {

                    row.put(webDataItem.getId2(), webDataItem);
                }
            }
            Thread.sleep(2000);
        }
    }

    public synchronized Collection<WebData> getData(int id){
        return cache.get(id).values();
    }

    public synchronized void requestData(int id) {
        requests.add(id);
    }
}

-

class Client implements Runnable {
    private final WebDataCache cache;
    private final int id;

    public Client(int id, WebDataCache cache){
        this.id = id;
        this.cache = cache;
    }
    @Override
    public void run() {

        cache.requestData(id);

        while (true) {


            for (WebData item : cache.getData(id)) {
            // java.util.ConcurrentModificationException is thrown here...
            // I understand that the collection is probably being modified in WebDataCache::run()
            // my question what's the best way to sychronize this code snippet?
            }
        }
    }
}

Thanks!

+5  A: 

Use java.util.concurrent.ConcurrentHashMap instead of plain old java.util.HashMap. From the Javadoc:

A hash table supporting full concurrency of retrievals and adjustable expected concurrency for updates. This class obeys the same functional specification as Hashtable, and includes versions of methods corresponding to each method of Hashtable. However, even though all operations are thread-safe, retrieval operations do not entail locking, and there is not any support for locking the entire table in a way that prevents all access. This class is fully interoperable with Hashtable in programs that rely on its thread safety but not on its synchronization details.

http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/ConcurrentHashMap.html

So you would replace:

private Map<Integer, Map<Long, WebData>> cache =
    new HashMap<Integer, Map<Long, WebData>>();

With

private Map<Integer, Map<Long, WebData>> cache =
    new ConcurrentHashMap<Integer, Map<Long, WebData>>();
LES2
+3  A: 

My best recommendation is to use an existing cache implementation such as JCS or EhCache - these are battle tested implementations.

Otherwise, you have a couple of things going on in your code. Things that can break in funny ways.

  • HashMap can grow infinite loops when modified concurrently by multiple threads. So don't. Use java.util.concurrent.ConcurrentHashMap instead.
  • The ArrayList that you use for WebDataCache.requests isn't thread-safe either and you have inconsistent synchronization - either change it to a safer list implementation from java.util.concurrent or make sure that all access to it is synchronizing on the same lock.
  • Lastly, have your code checked with FindBugs and/or properly reviewed by someone with solid knowledge and experience on writing multi-threaded code.

If you want to read a book on this stuff, I can recommend Java Concurrency in Practice by Brian Goetz.

Christian Vest Hansen
Yeah, I wanted to comment on the List implementations used too. Basically, all of your shared data structures need to be concurrency-ready (TM). But Ehcache would be the best solution (or a similar product).
LES2
A: 

You can synchronize on the row returned by the cache who is at the end who holds the collection that is being shared.

On WebDataCache:

            Map<Long, WebData> row = cache.get(id);

            if (row == null) {
                row = cache.put(id, new HashMap<Long, WebData>());
             } else synchronized( row ) {
                row.clear();
             }

            for (WebData webDataItem : webData) synchronized( row ) {

                row.put(webDataItem.getId2(), webDataItem);
            }

           // it doesn't make sense to synchronize the whole cache here. 
           public Collection<WebData> getData(int id){
               return cache.get(id).values();
           }

On Client:

         Collection<WebData>  data = cache.getData(id);
         synchronized( data ) {
             for (WebData item : cache.getData(id)) {
             }
         }

Of course this is far from perfect it just answer the question of what to synchronize. In this case it would be the access to the underlaying collection in. row.clear row.put on the cache and the iteration on the client.

BTW, why do you have a Map in the cache, and you use a collection in the client. You should use the same structure on both and don't expose the underlying implementation.

OscarRyz
+2  A: 

The answer from LES2 is good except that you would have to replace:

 row = cache.put(id, new HashMap<Long, WebData>());

with:

row = cache.put(id, new ConcurrentHashMap<Long, WebData>());

For that's the one that hold the "problematic" collection and not the whole cache.

OscarRyz
@Oscar - the whole cache is problematic too because of the non-thread-safe way it is being accessed and updated.
Stephen C
+1  A: 

In addition to the other posted recommendations, consider how often the cache is updated versus just being read. If the reading dominates and updating is rare, and it's not critical that the reading loop be able to see every update immediately, consider using a CopyOnWriteArraySet. It and its sibling CopyOnWriteArrayList allow concurrent reading and updating of the members; the reader sees a consistent snapshot unaffected by any mutation of the underlying collection -- analogous to the SERIALIZABLE isolation level in a relational database.

The problem here, though, is that neither of these two structures give you your dictionary or associative array storage (a la Map) out of the box. You'd have to define a composite structure to store the key and value together, and, given that CopyOnWriteArraySet uses Object#equals() for membership testing, you'd have to write an unconventional key-based equals() method for your structure.

seh