ansaurus

Question

Answer 1

+1 A:

There might be possibility that you can reduce lock contention (thus improving through-put) by using "lock-striping" -- essentially, split one lock into several ones, each lock protecting particular group of users.
The tricky part is how to figure out how to assign users to groups. The simplest case is when you can assign request from any user to any group. If your data model requires that requests from one user must be processed sequentially, you must introduce some mapping between user requests and groups. Here's sample implementation of StripedLock:

import java.util.concurrent.locks.ReentrantLock;

/**
 * Striped locks holder, contains array of {@link java.util.concurrent.locks.ReentrantLock}, on which lock/unlock
 * operations are performed. Purpose of this is to decrease lock contention.
 * <p>When client requests lock, it gives an integer argument, from which target lock is derived as follows:
 * index of lock in array equals to <code>id & (locks.length - 1)</code>.
 * Since <code>locks.length</code> is the power of 2, <code>locks.length - 1</code> is string of '1' bits,
 * and this means that all lower bits of argument are taken into account.
 * <p>Number of locks it can hold is bounded: it can be from set {2, 4, 8, 16, 32, 64}.
  */
public class StripedLock {
    private final ReentrantLock[] locks;

    /**
     * Default ctor, creates 16 locks
     */
    public StripedLock() {
        this(4);
    }

    /**
     * Creates array of locks, size of array may be any from set {2, 4, 8, 16, 32, 64} 
     * @param storagePower size of array will be equal to <code>Math.pow(2, storagePower)</code>
     */
    public StripedLock(int storagePower) {
        if (storagePower < 1 || storagePower > 6)
             throw new IllegalArgumentException("storage power must be in [1..6]");
        int lockSize = (int) Math.pow(2, storagePower);
        locks = new ReentrantLock[lockSize];
        for (int i = 0; i < locks.length; i++)
            locks[i] = new ReentrantLock();
    }

    /**
     * Locks lock associated with given id.
     * @param id value, from which lock is derived
     */
    public void lock(int id) {
        getLock(id).lock();
    }

    /**
     * Unlocks lock associated with given id.
     * @param id value, from which lock is derived 
     */
    public void unlock(int id) {
        getLock(id).unlock();
    }

    /**
     * Map function between integer and lock from locks array
     * @param id argument
     * @return lock which is result of function 
     */
    private ReentrantLock getLock(int id) {
        return locks[id & (locks.length - 1)];
    }
}

Victor Sorokin 2010-10-25 18:50:34

Answer 2

+3 A:

Assumptions

efficient means that doGet() should complete as quickly as possible
cachedPageIsStale() takes no time at all
downloadNewVersionOfResource() takes a little time

Answer

Sychronizing reduces network load, because only one thread fetches the resource when it expires. Also, it will not unduly delay processing of other threads - as the VM contains no current snapshot the threads could return, they'll have to block, and there is no reason an additional concurrent downloadNewVersionOfResource() will complete any faster (I'd expect the opposite due to network bandwith contention).

So synchronizing is good, and optimal in bandwith consumption and response times. (The CPU overhead of synchronization is vanishingly small compared to I/O waits) - assuming that a current version of the resource might not be available when doGet() is invoked; if your server always had a current version of the resource, it could send it back without delay. (You might have a background thread download the new version just before the old one expires.)

PS

You haven't shown any error handling. You'll have to decide whether to propagate exceptions thrown by downloadNewVersionOfResource() to your callers or continue to serve the old version of the resource.

Edit

So? Let's assume you have 100 connection workers, and the check whether the resource is stale takes one microsecond, the resource is not stale, and serving it takes one second. Then, on average, 100 * 10^-6 / 1 = 0.0001 threads are trying to get the lock. Barely any contention at all. And the overhead of acquiring an untaken lock is on the order of 10^-8 seconds. There is no point optimizing things that already take microsends, when the network will cause delays of milliseonds. And if you don't believe me, do a microbenchmark for synchronization. It is true that frequent, needless sychronization adds significant overhead, and that synchronizing collection classes were deprecated for that reason. But that's because these methods do very little work per invocation, and the relative overhead of sychronization was a lot bigger. I just did a small microbenchmark for the following code:

synchronized (lock) {
    c++;
}

On my notebook, this takes 50 nanoseconds (5*10^-8 seconds) averaged over 10 million executions in sun's hotspot vm. This is about 20 times as long as the naked increment operation, so if one does a lot of increments, sychronizing each of those will slow the program an order of magnitude. If however, that method did blocking I/O, waiting, say, 1 ms, adding those same 50 nanoseconds would reduce throughput by 0.005%. Surely you have better opportunities for performance tuning :-)

This is why, you should always measure before starting to optimize. It prevents you from investing hours of your time to save a couple nano seconds processor time.

meriton 2010-10-25 19:42:10

Thanks for explaining why synchronization is good, but I already knew that. What I'm interested in is optimizing it. Most of the time the cache will not be stale, but all requests still need to lock, verify this and unlock, one after the other.

Bart van Heukelom 2010-10-25 20:01:47

Added some more detail why synchronization overhead doesn't matter in this case.

meriton 2010-10-25 22:50:28

I agree with meriton. The cost of that code block is negligible compared to the cost of serving the request. Even if you completely optimize that piece and make it 100 times faster, the gain in throughput is going to be close to zero. Just in case, run a profiler under a production load to confirm.

Slava Imeshev 2010-10-26 08:30:05

ansaurus

tags:

views:

answers:

Efficient cache synchronization

related questions