tags:

views:

345

answers:

5

Let's say I have the following (assume restricted to java 1.4 so no generics) :

public class CacheManager {
    static HashMap states;
    static boolean statesLoaded;

    public static String getState(String abbrev) {
        if(!statesLoaded) {
            loadStates();
        }
        return (String) states.get(abbrev);
    }

    private static void loadStates() {
        //JDBC stuff to load the data
        statesLoaded = true;
    }
}

In a high-load multi-thread environment like a web app server, this could theoretically have problems if > 1 thread tries to get and load the cache at the same time. (Further assuming there's no startup code on the web app to initialize the cache)

Is simply using Collections.synchronizedMap sufficient to fix this? Does the returned synchronizedMap have performance issues when doing get(), if a lot of threads are accessing it?

Or would it be better to have a non-synchronized HashMap, and instead synchronize on the load method or boolean variable? I would think that if you synchronized either of those, you might end up locking the class.

For instance, if the load method was synchronized, what if 2 threads enter the getStates() method at the same time, and both see that statesLoaded is false. The first one gets a lock on the method, loads the cache and sets statesLoaded to true. Unfortunately, the 2nd thread has already evaluated that statesLoaded was false, and proceeds to the load method once the lock is free. Won't it go ahead and load the cache again?

+1  A: 

You should synchronize this check:

if(!statesLoaded) {
    loadStates();
}

Why ? Multiple threads can get() on the map without any problems. However, you need to atomically check the statesLoaded flag, load the state, and set the flag, check it. Otherwise you could (say) load the states, but the flag would still not be set and be visible as such from another thread.

(You could potentially leave this unsynchronised and allow the possibility of multiple threads to re-initialise the cache, but at the least it's not good programming practise, and at worst could cause you problems further down the line with large caches, differing implementations etc.)

Consequently, having a synchronised map isn't enough (this is quite a common misunderstanding, btw).

I wouldn't worry about the performance impact of synchronisation. It used to be an issue in the past, but is a much more lightweight operation now. As always, measure and optimise when you have to. Premature optimisation is often a wasted effort.

Brian Agnew
so what should my locking object be? the states HashMap? won't this cause a bottleneck? what if I have multiple HashMaps inside this CacheManager. Will it not lock the whole static class if I use the particular HashMap as the locking object around its load check?
So you can lock on the containing object, which is rather coarse, or you can provide a simple object thus : Object lock = new Object(); which would be the lock for this method and little else.
Brian Agnew
Don't forget you're not locking around the get();
Brian Agnew
A: 

Don't try and do this yourself. Use an IoC container like Spring or Guide and get the framework to manage and initialise the singleton for you. This makes your synchronization problems much more manageable.

skaffman
A: 

whats wrong with the Singleton pattern?

public class CacheManager {

    private static class SingletonHolder
    {
     static final HashMap states;
     static
     {
      states = new HashMap();
      states.put("x", "y");
     }
    }

    public static String getState(String abbrev) {
     return (String) SingletonHolder.states.get(abbrev);
    }

}
Trevor Harrison
A: 

Since statesLoaded only can go from false to true I'd go for a solution where you first check the statesLoaded is true, if it is you just skip the initalization logic. If it isn't you lock and check again and if it's still false you load the states it and set the flag to true.

This means that any thread calling getState after the cache is initialized will "early out" and use the map without locking.

something like:

// If we safely know the states are loaded, don't even try to lock
if(!statesLoaded) {
  // I don't even pretend I know javas synchronized syntax :)
  lock(mutex); 
  // This second check makes sure we don't initialize the
  // cache multiple times since it might have changed
  // while we were waiting for the mutex
  if(!statesLoaded) {
    initializeStates();
    statesLoaded = true;
  }
  release(mutex);
}
// Now you should know that the states are loaded and they were only
// loaded once.

This means that the locking will only be involved before and during the actual initalization happens.

If this would be C, I'd also make sure to make the statesLoaded variable to be volatile to make sure the compiler optimize the second check. I don't know how java behaves when it comes to situations like that but I would guess it considers all shared data such as statesLoaded to be potentially dirty when going into synchronization scopes.

Laserallan
+1  A: 

The best way to load the cache in this case is to take advantage of the JVM static initialization:

public class CacheManager {
    private static final HashMap states = new HashMap();

    public static String getState(String abbrev) {
        return (String) states.get(abbrev);
    }

    static {
        //JDBC stuff to load the data
    }
}

The cache will be loaded the first time the class is being used, and since the static initialization is thread safe, the map will be populated safely. Any subsequent calls to retrieve the values can be done without any locking involved.

It is always a good idea to take advantage of static initialization whenever possible. It is safe, efficient, and often quite simple.

sjlee