views:

150

answers:

7

Hi,

I am trying to plug some shameful gaps in my Java threading knowledge, and I'm reading Java Concurrency in Practice by Brian Goetz et al (highly recommended BTW), and one of the early examples in the book leaves me with a question. In the following code, I totally understand why synchronization is needed when updating the hits and cacheHits member variables, but why is it needed for the getHits method when just reading the hits variable?

Example code from chapter 2:

public class CachedFactorizer extends GenericServlet implements Servlet {
  private BigInteger lastNumber;
  private BigInteger[] lastFactors;
  private long hits;
  private long cacheHits;

public synchronized long getHits() {
    return hits;
}

public synchronized double getCacheHitRatio() {
    return (double) cacheHits / (double) hits;
}

public void service(ServletRequest req, ServletResponse resp) {
    BigInteger i = extractFromRequest(req);
    BigInteger[] factors = null;
    synchronized (this) {
        ++hits;
        if (i.equals(lastNumber)) {
            ++cacheHits;
            factors = lastFactors.clone();
        }
    }
    if (factors == null) {
        factors = factor(i);
        synchronized (this) {
            lastNumber = i;
            lastFactors = factors.clone();
        }
    }
    encodeIntoResponse(resp, factors);
}...

I have a feeling it is to do with atomicity, monitors and locks, but I don't fully understand those, so please could someone out there explain a little deeper?

Thanks in advance...

James

+3  A: 

Because otherwise you may see a stale value of hits. Volatile would have also worked. This is relevant http://www.ibm.com/developerworks/java/library/j-jtp04186/index.html

David Soroko
So it's purely for the sake of consistency, i.e. Java doesn't force you/run the risk of throwing an exception?
James B
I believe that David's second point is the more important (though the first is still accurate). But, volatile would not work for the non-atomic issue, so you still have to use synchronization to get both effects.
CWF
@user282312: (can't you get a real handle <smile>). Actually, I was surprised by the link to the JLS in Michael's answer below which indicates that volatile longs must indeed have atomic semantics.
Software Monkey
+1  A: 

Basic rule of thumb is that when you need synchronized access to a variable, you always need it for both read and write. Else there is a chance that the value updated by one thread will never be visible to the other threads ...

Guillaume
+3  A: 

The idea of Monitors and locks is to prevent two seperate threads from simultaneously attempting to either access or modify some specific resource that is used by both threads. You are effectively asking "But why lock when I'm only reading the data?" - and the answer is because if you don't, someone else could modify it at the same time, and you would not be aware of the new value - which depending on the logic of your program might cause a number of difficult to catch bugs as your program executes - commonly known as race conditions...

When working with threads, code defensively, tread (and indeed thread) carefully.

Martin Milan
+1 because I'm a sucker for a good pun! - Thanks
James B
-1: This answer is wrong. *so what* if another thread modifies the variable and I'm not aware of the new value?? It's no different as if I do the read access synchronized and before the change happens.
Michael Borgwardt
@Michael - so would you not synchronize the `getHits` method?
James B
I do not really understand this example "and you would not be aware of the new value". After `getHits()` returns. the value of `hits` may change with or without sycnronization,
David Soroko
@James B: I would (or make the variable volatile). But I'd do it because as a long, the value is not atomic, and because of cache issues. Not because "someone else could modify it at the same time".
Michael Borgwardt
@Michael: It is not the same as performing the read "just before" doing the write. The CPU cache value may live for a relatively long time so it might be a case of reading the old value indefinitely after it has been updated. Synchronization or volatility (Java5+ for the latter) is required to ensure you see the correct value on read.
Software Monkey
After reading Michael's answer, I see the subtlety he is picking at here - it's not that there is only a problem with a modification "at the same time". This answer is close, but subtly either incorrect or misleading. If I could I would retract my upvote (but I wouldn't downvote it).
Software Monkey
+1  A: 

Because hits is not marked as volatile. If you are accessing a variable that can be changed to more than one thread, like this, you have to mark it as volatile or access it in a synchronized block.

EDIT: As said in here and exemplified in the Effective Java book, if you don't access it through a synchronized block you could not see a value that was changed by another thread. Please note that this is happening because that variable is not final and it is not one of the wrappers in the concurrency package. If it was final and one of those variables, then the synchronized block wouldn't be necessary.

... another option would be to use an AtomicLong, making it final. That variable wouldn't have to be volatile because the variable is not change, only its contents, which is managed by the AtomicLong class.

Ravi Wallau
+6  A: 

Apart from the fact that a thread's cache view of non-volatile variables may stay non-updated indefinitely when there is no synchronization:

read/write access to a long or double is NOT guaranteed to be atomic!

You could theoretically end up seeing a value where only the first or last 4 bytes are updated. However, volatile solves this problem as well.

Michael Borgwardt
+1 - great link, that is a real potential gotcha!
James B
+1  A: 

There are multiple potential issues here. Michael has pointed out one big one (non-atomicity of long stores) but there's another. In the absence of a 'happens-before' relationship (which is provided between the release of and the acquisition of a lock, e.g. in a synchronized block), the writes can be seen out of order.

Notice that the line ++hits comes before the ++cacheHits in service(). In the absence of synchronized, the JVM is quite entitled to reorder those instructions in a way which may seem confusing to other threads. For example, it may reorder the ++cacheHits before the ++hits, or it may make the increased value of cacheHits visible to other threads before the increased value of hits (in this case the distinction isn't important, because the result could be the same). Imagine a reordering, starting from a clean cache, which results in the following interleaving:

Thread 1                  Thread 2
---------------           ----------------
++cacheHits (reordered)
  cacheHits=1, hits=0
                          read hits (as 0)
++hits
  cacheHits=1, hits=1
                          read cacheHits (as 1)

                          calculate 1 / 0 (= epic fail)

You won't get the result you're expecting, that's for sure.

Mind you, that's easy to debug. You could have 1000 service() calls, and then the reading thread sees cacheHits as 500 and hits as 1. A 50,000% cache hit ratio is probably less obvious and even more confusing to the poor debugger.

Synchronizing the read sets up a happens-before relationship so this won't happen, and the locking then provides the other advantages that others mention.

Cowan
Thanks, it was a close run thing to accept yours or Michael B's answer, I chose yours because you covered the compiler re-rodering. Thanks!
James B
A: 

If you are concerned about the performance of the code (which is my guess), you should know that (1) stability is preferred over speed in concurrent applications, and (2) other people like you in Sun had the same concern and thus devised this:

ReentrantReadWriteLock

dimitko