views:

1412

answers:

11

Here's the deal. I have a hash map containing data I call "program codes", it lives in an object, like so:

Class Metadata
{
    private HashMap validProgramCodes;
    public HashMap getValidProgramCodes() { return validProgramCodes; }
    public void setValidProgramCodes(HashMap h) { validProgramCodes = h; }
}

I have lots and lots of reader threads each of which will call getValidProgramCodes() once and then use that hashmap as a read-only resource.

So far so good. Here's where we get interesting.

I want to put in a timer which every so often generates a new list of valid program codes (never mind how), and calls setValidProgramCodes.

My theory -- which I need help to validate -- is that I can continue using the code as is, without putting in explicit synchronization. It goes like this: At the time that validProgramCodes are updated, the value of validProgramCodes is always good -- it is a pointer to either the new or the old hashmap. This is the assumption upon which everything hinges. A reader who has the old hashmap is okay; he can continue to use the old value, as it will not be garbage collected until he releases it. Each reader is transient; it will die soon and be replaced by a new one who will pick up the new value.

Does this hold water? My main goal is to avoid costly synchronization and blocking in the overwhelming majority of cases where no update is happening. We only update once per hour or so, and readers are constantly flickering in and out.

A: 

I think it's risky. Threading results in all kinds of subtly issues that are a giant pain to debug. You might want to look at FastHashMap, which is intended for read-only threading cases like this.

At the least, I'd also declare validProgramCodes to be volatile so that the reference won't get optimized into a register or something.

sblundy
FastHashMap is absurdly risky!
Tom Hawtin - tackline
+18  A: 
erickson
thanx for great lesson on multithreading.
badbadboy
Excellent answer.
Christian Vest Hansen
+1  A: 

The assignment will work as long as you're not concerned about reading stale values, and as long as you can guarantee that your hashmap is properly populated on initialization. You should at the least create the hashMap with Collections.unmodifiableMap on the Hashmap to guarantee that your readers won't be changing/deleting objects from the map, and to avoid multiple threads stepping on each others toes and invalidating iterators when other threads destroy.

( writer above is right about the volatile, should've seen that)

Steve B.
Just to be clear, the way you guarantee that "your hashmap is properly populated on initialization" is to use synchronization :) Either synchronized, volatile, or something from java.util.concurrent.
Scott Bale
A: 

I don't know much about threading but I am trying to think together with you, this is more like a question to you and the others.

  1. What can happen if the assignment operation is "half-done", and then another thread tried to read the reference?

  2. If you put a lock around the assignment block, does this mean that the readers will have to wait for a little sec? But thus will only happen once in an hour right?

badbadboy
1. They'll keep reading stale values from the old HashMap. 2. No, they won't have to wait. Only new readers trying to get access to the HashMap will have to wait.
Bill the Lizard
2. If you only put a lock around the assignment block, and not around the getter, then you only guarantee that two threads don't try to set a new map at the same time. It won't affect threads calling the getter at all.
mtruesdell
+2  A: 

I think your assumptions are correct. The only thing I would do is set the validProgramCodes volatile.

private volatile HashMap validProgramCodes;

This way, when you update the "pointer" of validProgramCodes you guaranty that all threads access the same latest HasMap "pointer" because they don't rely on local thread cache and go directly to memory.

bruno conde
A: 

If I read the JLS correctly (no guarantees there!), accesses to references are always atomic, period. See Section 17.7.

So, if the access to a reference is always atomic and it doesn't matter what instance of the returned Hashmap the threads see, you should be OK. You won't see partial writes to the reference, ever.


Edit: After review of the discussion in the comments below and other answers, here are references/quotes from

Doug Lea's book (Concurrent Programming in Java, 2nd Ed), p 94, section 2.2.7.2 Visibility, item #3: "

The first time a thread access a field of an object, it sees either the initial value of the field or the value since written by some other thread."

On p. 94, Lea goes on to describe risks associated with this approach:

The memory model guarantees that, given the eventual occurrence of the above operations, a particular update to a particular field made by one thread will eventually be visible to another. But eventually can be an arbitrarily long time.

So when it absolutely, positively, must be visible to any calling thread, volatile or some other synchronization barrier is required, especially in long running threads or threads that access the value in a loop (as Lea says).

However, in the case where there is a short lived thread, as implied by the question, with new threads for new readers and it does not impact the application to read stale data, synchronization is not required.


@erickson's answer is the safest in this situation, guaranteeing that other threads will see the changes to the HashMap reference as they occur. I'd suggest following that advice simply to avoid the confusion over the requirements and implementation that resulted in the "down votes" on this answer and the discussion below.

I'm not deleting the answer in the hope that it will be useful. I'm not looking for the "Peer Pressure" badge... ;-)

Ken Gentle
You won't see partial writes, but you might never see a write at all, unless the variable is volatile or the reading and writing threads synchronize.
erickson
No kidding. Please read the question and response carefully - the app doesn't care if the old values never change as long as new readers pick up the change. The response says "if ... it doesn't matter what instance ..."
Ken Gentle
Even new readers aren't guaranteed to pick up the change, ever. Without specifying a barrier, a write doesn't have to be flushed to main memory where it's visible to other threads. You might never encounter the bug in tests. When it offers a cheap guarantee of safety, why not use volatile?
erickson
@erickson: If there is something incorrect with the answer, please post what is incorrect. You're right, volatile with force a write. There is a cost associated with volatile. With a new thread and new reader, there is no data race - it will get the correct value. See 17.4
Ken Gentle
Strictly speaking, there's nothing incorrect in your answer, given your conditions. That's why I didn't down vote it. But, I believe the question implies that updates should be seen eventually, and that readers can safely use a new map when they see it. Without volatile, these aren't guaranteed.
erickson
@erickson: You seem to be downvoting others based on your assumption that volatile is the correct answer. Noone is claiming that answer is incorrect, but the other answers are not incorrect either. You are also basing your statements on false assumptions of your interpretation of the question.
Robin
My mistake on the downvoting, sorry!
Robin
"When a program contains two conflicting accesses (§17.4.1) that are not ordered by a happens-before relationship, it is said to contain a data race." If validProgramCodes is not volatile, what establishes the happens-before relationship?
erickson
@Robin, I haven't downvoted any responses to this question. Also, I am inferring things about the question, but not without reason; certainly the assumptions have not been shown false.
erickson
@Erickson: I'm ready to take this to the Advanced Java list to let Jeff Kesselman settle it once and for all. We're mostly in agreement - we're disagreeing on conditions and the requirements of the application. I don't believe 17.4.1 applies to the application as described.
Ken Gentle
After some research into what erickson was saying, I see his point. It also raised a question. If the update is done on another thread that exits after updating, I assume that would force an update of main memory, which would mean volatile is unnecessary. If it does not exit, then it is required.
Robin
That last one is a question by the way, as it was an assumption of how the updates would work. Enlighten me if that is incorrect. (I have already been enlightened some by the way, thanks 8-)
Robin
As I answered and commented elsewhere, the bigger potential problem hasn't been mentioned here yet, which is that w/o synchronization a new HashMap is not safely published, and so may become visible to reader threads IN AN INVALID STATE (i.e. it's content not yet visible to the reader threads).
Scott Bale
Are you asserting that the first time a new thread reads a field, it is guaranteed that *the* most recent write to the field by another thread is visible? Or merely that *some* previous write is visible? Why do you stress the newness of the thread?
erickson
No, I'm not. In the general case for a multi-threaded, concurrent application you are absolutely correct, volatile or a barrier is required to guarantee that one thread will get _the_ most recent value from another thread. In this particular case, the questioner is probably better off doing so.
Ken Gentle
The assignment of a new HashMap instance to the "validProgramCodes" field is not the issue. Yes, assignment is atomic. In the absence of proper synchronization, the assignment of that field AND THE POPULATION OF THAT MAP may appear reordered from the POV of a reader thread. Google "Safe Publication"
Scott Bale
@Scott: I've already acknowledged that @erickson's answer is the safest. Yes, you're correct, partial initialization of the HashMap is possible. So do I delete this answer or not? It is not incorrect - just different.
Ken Gentle
@Ken No I wouldn't delete. But imho some of your wording IS incorrect, particularly: "However, in the case where there is a short lived thread, as implied by the question, with new threads for new readers and it does not impact the application to read stale data, synchronization is not required."
Scott Bale
@Ken (cont) Synchronization IS required, unless it is okay for a thread to see INCORRECT data (that is, a HashMap with partial, missing data).
Scott Bale
@Scott: We don't know how the HashMap is constructed or initialized - you're making an assumption that "validProgramCodes" is not actually referring to a 'Collections.synchronizedMap' which would be guaranteed to be completely initialized on access, right? We don't know what it is.
Ken Gentle
@Ken "validProgramCodes" type is HashMap, not Map, so it cannot refer to anything other than a HashMap (or HashMap subclass). But aside from that, Collections$SynchronizedMap is only safe if it is safely published, it's constructor does not have any synchronization, so even that idea would not work.
Scott Bale
@Ken Yes, I am assuming Metadata is part of a public API, that it cannot control what parameters are passed in, so it has the responsibility to provide proper synchronization if the class is to be used concurrently. If you were the PM would you be comfortable okaying this API as is?
Scott Bale
@Scott: DOH! on the Map/syncrhonizedMap/HashMap - however, the assumption about initialization stands. On whether I would approve the API, it depends on the requirements and the entire class implementation. I'm assuming the example is intentionally incomplete and would not be committed as is.
Ken Gentle
@Ken Assumptions with this limited example are unavoidable. You have a point that synchronization could either happen within Metadata or externally (as long as it were consistently applied). But something (likely synchronization) must happen SOMEWHERE to guarantee safe publication in this example.
Scott Bale
A: 

I have to go with Ken G and Bill the Lizard on this one. Your assumptions will work fine. The only addition I would recommend is that the HashMap is made unmodifiable (as has been suggested).

Also, depending on how the reader threads are using your code table, marking it as volatile may actually cause logical problems, as successive access to the map may give an inconsistent view of the data stored within it. That is, getting some old and some new values from consecutive reads on the same thread may be logically inconsistent.

Robin
If I am going to get downvoted, do the courtesy of giving a reason why. I noticed that both the answers I referred to have been downvoted as well when they are perfectly good responses to the actual question.
Robin
The question implies that a reader thread eventually *will* see updates; otherwise, validProgramCodes would appear to be "final" to readers. An update is invisible until it is flushed from the cache to main memory, and that may never happen if the variable access is not volatile or synchronized.
erickson
No, it explicitly says "A reader who has the old hashmap is okay; he can continue to use the old value, as it will not be garbage collected until he releases it. Each reader is transient; it will die soon and be replaced by a new one who will pick up the new value". Your statement is incorrect.
Robin
Are you saying that "a new one who will pick up the new value" means that it's okay if new readers never see the new value? Without a happens-before condition between read and write, they may not. Worse, a reader might start to use the map as it's being populated, raising ConcurrentModificationEx.
erickson
New readers would get the new HashMap when it has been reassigned. The concurrent mod problem doesn't exist since the map isn't being modified, it is being replaced. Any thread started after the switch will get the new map, existing threads are referencing the old one.
Robin
Can you cite the language specification that guarantees that, "Any thread started after the switch will get the new map"? I believe this statement is false in the absence of a happens-before relationship as defined by the JLS, making your approach unsafe. See JLS §17.3.
erickson
The JMM FAQ has some straightforward, relevant explanations too: http://www.cs.umd.edu/~pugh/java/memoryModel/jsr-133-faq.html#reordering
erickson
Whether or not it's okay for new reader threads to never see an update to the HashMap field, what's NOT okay is that a reader thread might see a partially-initialized HashMap. Check out "Reordering" section of @erickson's answer, or google "safe publication".
Scott Bale
Reordering does not make the assignment non-atomic. So how can it cause a partially-initialized map? @ericson is right about the happens before semantics, but that still does not cause this issue, only the possibility of not seeing the new map at all.
Robin
@Robin - yes the assignment is atomic, that's not the potential problem. The problem is that the assignment may be visible to the reader thread BEFORE the HashMap's state initialization is visible. Those two different things require synchronization to happen atomically (i.e. safe publication).
Scott Bale
@Robin, in other words, if Thread "Writer" instantiates a HashMap, adds a bunch of stuff, and sets the reference, without synchronization Thread "Reader" may obtain a reference to that HashMap but with only some or none of the contents of the HashMap visible to that Reader thread.
Scott Bale
I don't think so. That could only occur if the writer thread was reordered so that the map is created, assignment is made, and then the map is initialized. Pretty sure this would break the as-if-serial semantics on the writer thread.
Robin
@Robin, it's actually *within thread* as-if-serial semantics. So the JVM is free to reorder operations as long as the program result is the same as if executed sequentially by that single thread. But that's no guarantee that operations will appear out of order from the POV of other threads.
Scott Bale
@Robin, additionally, reordering isn't the only potential trick a jvm can play. There's also caching. In practice it's unlikely, but in theory the jvm could cache the updates to the HashMap state on a register or something rather than main memory, so the state would not be visible to reader thread
Scott Bale
Really the bigger picture here (as a colleague just said) is that with the JMM and use of proper synchronization things are guaranteed to happen as you expect in a multi-threaded environment. It's not worth skimping on.
Scott Bale
@Robin, do not be so stubborn if you are not sure :) This is the whole point of SO to learn something new. And I think getting some down votes is a good way of remembering threading can be really complicated and can cost a lot.
badbadboy
No question that threading is complicated. But the comments are the only place here to discuss the question and answers. Obviously Scott and I disagree about certain semantics and we are unable to convince each other of our points of view. Face to face, I am sure it would be resolved in 5 min.
Robin
Anyway, time to get back to work.
Robin
I would urge all Java developers to read Goetz' Java Concurrency in Practice. When I read it, I was dismayed at how badly I misunderstood many things about Java multi-threading and synchronization.
Scott Bale
@Robin, okey. Discussion is always fun...and this question was very helpful
badbadboy
+3  A: 

No, by the Java Memory Model (JMM), this is not thread-safe.

There is no happens-before relation between writing and reading the HashMap implementation objects. So, although the writer thread appears to write out the object first and then the reference, a reader thread may not see the same order.

As also mentioned there is no guarantee that the reaer thread will ever see the new value. In practice with current compilers on existing hardware the value should get updated, unless the loop body is sufficienly small that it can be sufficiently inlined.

So, making the reference volatile is adequate under the new JMM. It is unlikely to make a substantial difference to system performance.

The moral of this story: Threading is difficult. Don't try to be clever, because sometimes (may be not on your test system) you wont be clever enough.

Tom Hawtin - tackline
+3  A: 

No, the code example is not safe, because there is no safe publication of any new HashMap instances. Without any synchronization, there is a possibility that a reader thread will see a partially initialized HashMap.

Check out @erickson's explanation under "Reordering" in his answer. Also I can't recommend Brian Goetz's book Java Concurrency in Practice enough!

Whether or not it is okay with you that reader threads might see old (stale) HashMap references, or might even never see a new reference, is beside the point. The worst thing that can happen is that a reader thread might obtain reference to and attempt to access a HashMap instance that is not yet initialized and not ready to be accessed.

Scott Bale
+2  A: 

As others have already noted, this is not safe and you shouldn't do this. You need either volatile or synchronized here to force other threads to see the change.

What hasn't been mentioned is that synchronized and especially volatile are probably a lot faster than you think. If it's actually a performance bottleneck in your app, then I'll eat this web page.

Another option (probably slower than volatile, but YMMV) is to use a ReentrantReadWriteLock to protect access so that multiple concurrent readers can read it. And if that's still a performance bottleneck, I'll eat this whole web site.

  public class Metadata
  {
    private HashMap validProgramCodes;
    private ReadWriteLock lock = new ReentrantReadWriteLock();

    public HashMap getValidProgramCodes() { 
      lock.readLock().lock();
      try {
        return validProgramCodes; 
      } finally {
        lock.readLock().unlock();
      }
    }

    public void setValidProgramCodes(HashMap h) { 
      lock.writeLock().lock();
      try {
        validProgramCodes = h; 
      } finally {
        lock.writeLock().unlock();
      }
    }
  }
Alex Miller
+1  A: 

While this is not the best solution for this particular problem (erickson's idea of a new unmodifiableMap is), I'd like to take a moment to mention the java.util.concurrent.ConcurrentHashMap class introduced in Java 5, a version of HashMap specifically built with concurrency in mind. This construct does not block on reads.

R. Bemrose