views:

756

answers:

6

I often need to implement DAO's for some reference data that doesn't change very often. I sometimes cache this in collection field on the DAO - so that it is only loaded once and explicitly updated when required.

However this brings in many concurrency issues - what if another thread attempts to access the data while it is loading or being updated.

Obviously this can be handled by making both the getters and setters of the data synchronised - but for a large web application this is quite an overhead.

I've included a trivial flawed example of what I need as a strawman. Please suggest alternative ways to implement this.

public class LocationDAOImpl implements LocationDAO {

private List<Location> locations = null;

public List<Location> getAllLocations() {
 if(locations == null) {
  loadAllLocations();
 }
 return locations;
}

For further information I'm using Hibernate and Spring but this requirement would apply across many technologies.

Some further thoughts:

Should this not be handled in code at all - instead let ehcache or similar handle it? Is there a common pattern for this that I'm missing? There are obviously many ways this can be achieved but I've never found a pattern that is simple and maintainable.

Thanks in advance!

+6  A: 

The most simple and safe way is to include the ehcache library in your project and use that to setup a cache. These people have solved all the issues you can encounter and they have made the library as fast as possible.

Aaron Digulla
+1  A: 

If your reference data is immutable the second level cache of hibernate could be a reasonable solution.

HeDinges
A: 

I think it's best to not do it yourself, because getting it right is a very difficult thing. Using EhCache or OSCache with Hibernate and Spring is a far better idea.

Besides, it makes your DAOs stateful, which might be problematic. You should have no state at all, besides the connection, factory, or template objects that Spring manages for you.

UPDATE: If your reference data isn't too large, and truly never changes, perhaps an alternative design would be to create enumerations and dispense with the database altogether. No cache, no Hibernate, no worries. Perhaps oxbow_lakes' point is worth considering: perhaps it could be a very simple system.

duffymo
Why would you bother using something like ehcache and hibernate for what might be a very simple system? Adding dependencies and such heavy -weight frameworks (as Hibernate) is a big decision in my opinion. I learnt the hard way that this off-the-shelf approach can come back to bite you
oxbow_lakes
He said he was already using Hibernate, so it seems a better idea to use EhCache than to write your own. The issue of whether or not to use Spring or Hibernate versus writing your own is another question.
duffymo
+3  A: 

In situations where I've rolled my own reference data cache, I've typically used a ReadWriteLock to reduce thread contention. Each of my accessors then takes the form:

public PersistedUser getUser(String userName) throws MissingReferenceDataException {
    PersistedUser ret;

    rwLock.readLock().lock();
    try {
        ret = usersByName.get(userName);

        if (ret == null) {
            throw new MissingReferenceDataException(String.format("Invalid user name: %s.", userName));
        }
    } finally {
        rwLock.readLock().unlock();
    }

    return ret;
}

The only method to take out the write lock is refresh(), which I typically expose via an MBean:

public void refresh() {
    logger.info("Refreshing reference data.");
    rwLock.writeLock().lock();
    try {
        usersById.clear();
        usersByName.clear();

        // Refresh data from underlying data source.

    } finally {
        rwLock.writeLock().unlock();
    }
}

Incidentally, I opted for implementing my own cache because:

  • My reference data collections are small so I can always store them all in memory.
  • My app needs to be simple / fast; I want as few dependencies on external libraries as possible.
  • The data is rarely updated and when it is the call to refresh() is fairly quick. Hence I eagerly initialise my caches (unlike in your straw man example), which means accessors never need to take out the write lock.
Adamski
A: 

Obviously this can be handled by making both the getters and setters of the data synchronised - but for a large web application this is quite an overhead.

I've included a trivial flawed example of what I need as a strawman. Please suggest alternative ways to implement this.

While this might be somewhat true, you should take note that the sample code you've provided certainly needs to be synchronized to avoid any concurrency issues when lazy-loading the locations. If that accessor is not synchronized, then you will have:

  • Multiple threads access the loadAllLocations() method at the same time
  • Some threads may enter loadAllLocations() even after another thread has completed the method and assigned the result to locations - under the Java Memory Model there is no guarantee that other threads will see the change in the variable without synchronization.

Be careful when using lazy loading/initialization, it seems like a simple performance boost but it can cause lots of nasty threading issues.

matt b
Thanks Matt - I realise that it is broken that's why I referred to it as flawed.
Pablojim
+2  A: 

If you just want a quick roll-your own caching solution, have a look at this article on JavaSpecialist, which is a review of the book Java Concurrency in Practice by Brian Goetz.

It talks about implementing a basic thread safe cache using a FutureTask and a ConcurrentHashMap.

The way this is done ensures that only one concurrent thread triggers the long running computation (in your case, your database calls in your DAO).

You'd have to modify this solution to add cache expiry if you need it.

The other thought about caching it yourself is garbage collection. Without using a WeakHashMap for your cache, then the GC wouldn't be able to release the memory used by the cache if needed. If you are caching infrequently accessed data (but data that was still worth caching since it is hard to compute), then you might want to help out the garbage collector when running low on memory by using a WeakHashMap.

A_M
Accepted as I learnt the most from it and there are certain situations where using an out of the box cache soluiton isn't sufficient.
Pablojim