views:

72

answers:

6

I'm trying to use double-checked locking to maintain an array of binomial coefficients, but I read recently that double-checked locking doesn't work. Efficiency is extremely important so using volatile isn't an option unless it's only inside the conditional statements. I can't see a way to use a static class with a singleton object (this is part of a framework and I don't know what kinds of numbers people will need to use the function for so I can't guess what the maximum chosen value will be or whether the function will be used at all). The only thing I can think of is to make everything not static and insist that each thread that needs to use this method instantiate a Choose object with its own array. It seems like that shouldn't be necessary.

public static final class Util{
/**
 * Static array of nCr values
 */
public static long[][] nCr_arr;

/**
 * Calculate binomial coefficient (n k)
 * 
 * @param n
 *            n
 * @param k
 *            k
 * @return n choose k
 */
public static long nCr(int n, int k) {
    if (k < 0)
        throw new ArithmeticException("Cannot choose a negative number");
    if (n < 0) {
        if (k % 2 == 0)
            return nCr(-n + k - 1, k);
        else
            return -nCr(-n + k - 1, k);
    }
    if (k > n)
        return 0;
    if (k > n / 2)
        k = n - k;
    if (nCr_arr == null) {
        synchronized (Util.class) {
            if (nCr_arr == null)
                nCr_arr = new long[n + 1][];
        }
    }
    if (nCr_arr.length <= n) {
        synchronized (Util.class) {
            if (nCr_arr.length <= n) {
                long[][] newNCR = new long[n + 1][];
                System.arraycopy(nCr_arr, 0, newNCR, 0, nCr_arr.length);
                nCr_arr = newNCR;
            }
        }
    }
    if (nCr_arr[n] == null) {
        synchronized (Util.class) {
            if (nCr_arr[n] == null)
                nCr_arr[n] = new long[k + 1];
        }
    }
    if (nCr_arr[n].length <= k) {
        synchronized (Util.class) {
            if (nCr_arr[n].length <= k) {
                long[] newNCR = new long[k + 1];
                System.arraycopy(nCr_arr[n], 0, newNCR, 0,
                        nCr_arr[n].length);
                nCr_arr[n] = newNCR;
            }
        }
    }
    if (nCr_arr[n][k] == 0) {
        if (k == 0)
            nCr_arr[n][k] = 1;
        else
            nCr_arr[n][k] = nCr(n, k - 1) * (n - (k - 1)) / k;
    }
    return nCr_arr[n][k];
}
}
A: 

Well, you could always avoid double checked locking by changing the code from:

if (nCr_arr == null) {
    synchronized (Util.class) {
        if (nCr_arr == null)
            nCr_arr = new long[n + 1][];
    }
}

to this:

synchronized (Util.class) {
    if (nCr_arr == null)
        nCr_arr = new long[n + 1][];
}

I bet the performance impact would be very small.

krtek
Most probably the impact will be really high: you are locking in each use of the pointer, while double locking has the cost of locking only in the (rare) cases where the external if fails.
David Rodríguez - dribeas
@dribeas: Likely not unless this is highly contended. Recent JVMs do not do a real lock in that case so uncontended synchronize is quite fast. He would need to do some benchmarks in his specific environment to find out.
Kevin Brock
Well it depends if this method is going to be called very very often. Anyway, double checked locking is still broken - so you could either use synchronisation or initialize the array at first call of the method.
krtek
I'm running through a couple hundred million game positions per job on 8 simultaneous threads (8-core cpu). For every position, this method may be called anywhere from 1 to 10 times. It is arguably the most used method in the entire program and any unnecessary synchronization will kill all hope of finishing in any reasonable amount of time. Although the array will only be expanded a couple times, it will constantly be accessed by multiple threads simultaneously so putting "synchronized" outside the conditional clauses most definitely will generate heaps of contended accesses.
dspyz
A: 

Are you sure you need to optimize this? Have you profiled running code and found the single lock is too expensive?

A: 

Or rewrite your code using new Java Concurrency API http://download.oracle.com/javase/1.5.0/docs/api/java/util/concurrent/locks/ReadWriteLock.html and obtain write lock only if really needed.

krtek
Unfortunately, the API docs say, "if the read operations are too short the overhead of the read-write lock implementation [...] can dominate the execution cost, particularly as many read-write lock implementations still serialize all threads through a small section of code"Between this response and the ConcurrentHashMap solution, I can see clearly that Java tries it's best to provide support for this sort of thing, and in this instance it's not good enough. So the best solution is to simply give a separate array to each thread.
dspyz
A: 

Given you are using this in a very performance critical part of your code, I recommend ditching the lazy initialization idea, because it requires several additional comparisons to be performed for each access to a coefficient.

Instead, I'd require the user of your library to manually specify how many coefficients she needs at initialization time. Alternatively, I'd precompute more than the user is every likely to need - you can fit all nCk for n < 1000 into 1 MB of memory.

PS: Might I suggest you use the recursive formula to compute a coefficient?

c[n][k] = c[n-1][k-1] + c[n-1][k]

It won't matter much, but why use to complicated formula when you need Pascals Triangle?

meriton
Either formula will do and you're quite right that I could simply pre-compute all the reasonable values I might need in this case. However, in the future I may need to create a 3-dimensional array for all multinomial coefficients [x^k](1-x^h)^n/(1-x), and then the amount I can easily precompute will significantly drop. Given a choice between having to set up an arbitrary maximum and creating Choose objects which threads are responsible for keeping track of, I'd rather go with the Choose objects.
dspyz
A: 

I looks like you a building a cache of the results as they are calculated, so you could use a concurrent map to hold the results by building a key that combines the 2 int values into a single long.

import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.ConcurrentMap;

public final class Util {
    /**
     * Static array of nCr values
     */
    private static final ConcurrentMap<Long,Long> CACHE = 
        new ConcurrentHashMap<Long, Long>();

    /**
     * Calculate binomial coefficient (n k)
     * 
     * @param n
     *            n
     * @param k
     *            k
     * @return n choose k
     */
    public static long nCr(int n, int k) {
        if (k < 0)
            throw new ArithmeticException("Cannot choose a negative number");
        if (n < 0) {
            if (k % 2 == 0)
                return nCr(-n + k - 1, k);
            else
                return -nCr(-n + k - 1, k);
        }

        if (k > n)
            return 0;
        if (k > n / 2)
            k = n - k;

        final long key = (n << 32L) + k;

        Long value = CACHE.get(key);
        if (value != null) {
            return value.longValue();
        } 

        long result;

        if (k == 0)
            result = 1;
        else
            result = nCr(n, k - 1) * (n - (k - 1)) / k;

        CACHE.put(key, result);

        return result;
    }
}
Michael Barker
Looking at the ConcurrentHashMap source, I see that the get() method begins with:if (count != 0) {//...But count is declared as volatile, so this requires a direct memory access every time.In general, java.util is more about convenience than efficiency, so using things out of the collections framework may have hidden drawbacks
dspyz
That's not always the case and depends on the CPU architecture. On Intel volatile reads are quite cheap (L1 or L2 cache access). Its far cheaper than a contended synchronized block of which there are 4 potential cases in your code. Avoiding those classes is probably a premature optimisation - especially if you are struggling to produce a correct solution. Your code lacks an interaction on the read path that will trigger the happens before/happens after semantics of the memory model, meaning some changes may not be visible (this is the flaw in the double checked pattern without volatile).
Michael Barker
A: 

I ended up just making it not static. If a thread needs to get nCr values, it creates a new Coefficient object and holds onto it.

dspyz