views:

270

answers:

8

I have a class which contains a static collection that can be used across all threads. It acts as a cache. When it is first loaded from the database, it is supposed to never change. However, in rare cases, we have to clear this cache. The current way to do this is to restart the server. I want to implement a clear cache method. What is the best way to do this?

Example class:

public class DepartmentRepository {
    private static Collection<Department> departments = new ArrayList<Department>();

    public synchronized Collection<Department> getAllDepartments() {
        if (departments.isEmpty()) {
            //do database lookup
        }
        return departments;
    }
    //more methods excluded
}

I want to add a method that looks like this:

public void clearCache() {
    departments.clear();
}

But I need to make sure it clears it across all threads and doesn't break anything.

+4  A: 

You have to synchronize the clear on the same object as the get. If you are doing that by declaring the method syncronized, then the method has to be on the same object (or static as part of the same class).

However, in terms of not breaking anything it is going to take more work, because now what could happen is that a different object will be in the middle of reading that collection, and you will get concurrent modification exceptions.

the easiest way around this whole problem is to use a synchronized collection, so that all access to the internals of the collection is thread safe.

However, that still won't stop some code from getting a concurrent modification exception if it is in the middle of iterating over the collection.

The best way around this problem is to use a CopyOnWriteArrayList. This class is designed to prevent this kind of problem by giving any code iterating over the collection a stale copy rather than throw an exception.

Yishai
Just synchronizing on the DepartmentRepository in the clear() method might not be enough, as the department collection itself is returned - every thread that calls getDepartment would have to synchroize whenever they access anything in the returned collection as well
nos
Very true. I expanded my answer.
Yishai
+4  A: 

Honestly the easiest way is just to:

private static Collection<Department> departments = Collections.synchronizedList(new ArrayList<Department>());

instead of the way you have it.

Another way would be:

public void clearCache() {
  synchronized(departments) { 
    departments.clear();
  }
}

assuming you're synching in your other threads in methods accessing this collection.

Nick
And the departments field can be made final. And the getAllDepartments method can return an unmodifiable view of the list. So you don't let it change if it's not suppose to.
Christian Vest Hansen
+1 for the Collections.synchronizedList. The second option is too risky; someone will screw it up. Also, consider wrapping any department list that's returned with `Collections.unmodifiableList`, or some bad actor could make a change that's visible to every instance.
erickson
This is insufficient, because it's still possible to have a get while the departments are in an incomplete state. The complete repopulation needs to happen in such a way that a get() can't happen while it's in progress. If you synchronize the collection you're guaranteeing that you won't get a concurrentModificationException as one thread deletes while another gets. so you'd need to do something like deleteAll(); addAll(new values); syncrhonized on departments lock. Or swap in a new collection synchronized with the same lock as your get() method.
Steve B.
Adding to this, a CopyOnWriteArrayList would be suitable for this problem as well.
nos
Is there performance hit with using a synchronizedList? This is at the core of a massively overgrown architecture.
emulcahy
Of course there's a cost. Synchronization isn't free. But you should compare it to the cost of bouncing the server.
CPerkins
If it's used a lot and you are going to synchronize every read, then it may be less expensive to bounce the server.
Bill K
+1  A: 

One problem is that you have a static (one per class) collection and non-static access methods. I would (at first) synchronize on the actual collection object itself.

e.g.

synchronized (departments) {
   departments.clear();
}

and do this for all other methods using the collection.

Brian Agnew
yep. This is the only way to be sure.
Nick
A: 

Try switching from ArrayList to Vector. In Java, Vectors are thread-safe. See also the fail-fast behavior of Vectors, which avoids concurrent modifications.

fjsj
vectors do pretty much the same thing as: private static Collection<Department> departments = Collections.synchronizedList(new ArrayList<Department>());They sync on most ops, just not on iteration changes in the collection (which are the dangerous ones).
Nick
The downside of Vectors is that they are considered obsolete by pretty much every Java programmer I have ever talked to.
Fortyrunner
@Fortyrunner, that is true, but since there would only be one line in use for it (the rest access through the interface) it is OK. You could wrap it with a Collections.syncronizedCollection, but that is a bit slower than Vector.
Yishai
How much slower? Have you measured it? I'll bet in practice (on a modern JVM such as Java 5/6) there is no percievable difference. Always go for the cleaner design if you can.
Fortyrunner
@Fortyrunner: synchronizedList creates a new object to wrap the underlying list, whereas Vector is already synchronized. So there is measurable difference in some circumstances: one fewer object and one fewer method call for each access. However I'd worry that someone would "clean up the code" and replace Vector with ArrayList without synchronizing....
Mr. Shiny and New
+2  A: 

Another issue you have is with the basic design, where your synchronized method returns a collection object, over which callers can iterate.

From the javadoc: It is imperative that the user manually synchronize on the returned list when iterating over it:

  List list = Collections.synchronizedList(new ArrayList());
      ...
  synchronized(list) {
      Iterator i = list.iterator(); // Must be in synchronized block
      while (i.hasNext())
          foo(i.next());
  }

If they're not doing that... if any caller is not doing that, clear will cause them problems as they iterate.

I'd suggest that you consider scrapping the getAllDepartments method and replacing with a set of methods which do lookup, or at least do defensive copies first.

CPerkins
+2  A: 

In response to a follow-up comment by the OP:

Is there performance hit with using a synchronizedList? This is at the core of a massively overgrown architecture.

Yes, there is a performance hit. You'd have to measure it to find out how much. For one thing, synchronizing every operation to the Department collection might result in a lot of contention even when most of the operations are read-only. For thread safety with performance, consider something like this:

import java.util.Collection;
import java.util.Collections;
import java.util.concurrent.locks.Lock;
import java.util.concurrent.locks.ReentrantReadWriteLock;

public class DepartmentRepository
{

  private static final ReentrantReadWriteLock lock =
    new ReentrantReadWriteLock(true);

  private static Collection<Department> departments = null;

  public Collection<Department> getAllDepartments()
  {
    Lock read = lock.readLock();
    read.lock();
    try {
      /* Check whether the departments are loaded. */
      if (departments == null) {
        /* If not loaded, release read-lock and acquire write-lock. */
        read.unlock();
        Lock write = lock.writeLock();
        write.lock();
        try {
          /* Recheck condition for updates by another thread. */
          if (departments == null) {
            Collection<Department> tmp = ...; // Initialize. 
            departments = Collections.unmodifiableCollection(tmp);
          }
          /* Downgrade from write-lock to read-lock. */
          read.lock();
        }
        finally {
          write.unlock();
        }
      }
      return departments;
    }
    finally {
      read.unlock();
    }
  }

  public void clearCache()
  {
    Lock write = lock.writeLock();
    write.lock();
    try {
      departments = null;
    }
    finally {
      write.unlock();
    }
  }

}

Yes, it's messy. Sharing data between threads is hard.

erickson
I ended up implementing this solution. You may want to add read.lock(); before the first try block.
emulcahy
Good catch! Sorry about the mistake.
erickson
+1  A: 

I think your concern is with locking on a read (which is obviously problematic). If you really have to avoid this, it actually is possible (to a degree).

Since you are using an ArrayList, the underpinnings are an array.

If you implement it as an array instead, you could actually remove a level of method-call indirection, and implement an easy non-locking adding mechanism.

Just pre-allocate some extra spaces in the array and leave them unused (null).

Write your "normal" mode to skip over nulls.

If you need to add to it, just set up your object completely then set it into the array. There is no place where threading can screw this up, so as long as you have added your new, completely ready object, everything should be fine with no locking at all. The other thread will, at some point, go from not seeing the new object to seeing it, but will never be in an invalid state.

For deletes just set the element to null.

In a multi-CPU machine, there is the chance that information is cached in the CPU cache. This isn't a problem on adds (the new object just won't be available instantly), on deletes it could be problematic--not sure.

You might even be able to resize the array by creating the new one and copying everything over and then replace the handle to the existing array with the new one, but the caching issue makes this awful hard to predict.

EDIT: I just remembered at one point reading that even setting a pointer wasn't guaranteed in Java without synchronization--that it was possible for the other CPU to get 1/2 the address wrong. Does anyone know if this was ever true and if so on what versions? It would completely invalidate this solution if true.

Bill K
A: 

Would it be possible to just create a new collection and swap them out once it's created? Ie. instead of:

Write: 
    Lock
    Clear collection
    Load collection
    Unlock
Read: 
    Lock
    Read
    Unlock

To use a new (read-only) list object:

Read: 
    Read (no locks)
Write: 
    Create new collectionwith new data
    Set collectionvar to point at new collection(fast)

This should avoid the need for locks all together.

I guess the answer to the question is the question: Is variable assignment atomic in Java.

Edit: I found at least one page that states that "assignment to variables of any type except long or double is atomic", which seems to indicate that my method would work.

RHSeeger