views:

335

answers:

7

Hi all,

I've been mulling this over & reading but can find an absolute authoritative answer.

I have several deep data structures made up of objects containing ArrayLists, Strings & primitive values. I can guarantee that the data in these structures will not change (no thread will ever make structural changes to lists, change references, change primitives).

I'm wondering if reading data in these structures is thread safe; i.e. is it safe to recursively read variables from the objects, iterate the ArrayLists etc. to extract information from the structures in multiple threads without synchronization?

Thanks, Jon

+5  A: 

The only reason why it wouldn't be safe is if one thread were writing to a field while another thread was simultaneously reading from it. No race condition exists if the data is not changing. Making objects immutable is one way of guaranteeing that they are thread safe. Start by reading this article form IBM.

Amir Afghani
Jon M
Yeah, pretty much all the java.util collections are thread-safe for read-only usage. The only gotch is LinkedHashMap/Set with a custom removeEldestEntry method.
james
Unfortunately, this is wrong. In the absence of *happens-before* conditions (as defined by the language specification), words like "before", "while", and "after" have no meaning. Making the objects immutable using `final` is a good idea, so I won't downvote, but the idea that thread-safety can be obtained without a memory barrier is very, very wrong.
erickson
Can you provide an example based on what the OP described? If I'm wrong, the answer should be down voted.
Amir Afghani
I was assuming that the values are set prior to reading them.
Amir Afghani
@erickson, to be fair, the question didn't really address that. he asked if multiple threads could read the data safely, which they can, provided the data has be published safely (what you are getting at). so maybe "very, very wrong" is a little strong and maybe you meant to say "yes as long is the data is published safely".
james
I'd encourage you to read some articles by Brian Goetz. There is a section in this one that might explain the problem better than I can: http://www.ibm.com/developerworks/java/library/j-jtp08223/index.html#2.0
erickson
Yep, and here's a good one on using volatile for safe publication: http://www.ibm.com/developerworks/java/library/j-jtp06197.html
james
See also the "Visibility Hazards" section: http://www.ibm.com/developerworks/java/library/j-jtp0618.html#1c
erickson
+2  A: 

Just as an addendum to everyone else's answers: if you're sure you need to synchronize your array lists, you can call Collections.synchronizedList(myList) which will return you a thread safe implementation.

Alex Beardsley
There are corner cases where this doesn't really guarantee thread safety. What is 2 threads each check the size of the list and then remove something?
Chandru
@Chandru that sort of atomicity requires a synchronized block, but it is only guaranteed by the `synchronizedList()` wrapper; trying to do the same thing with an `ArrayList` directly would be unreliable.
erickson
+1  A: 

I cannot see how reading from ArrayLists, Strings and primitive values using multiple threads should be any problem.

As long as you are only reading, no synchronization should be necessary. For Strings and primitives it is certainly safe as they are immutable. For ArrayLists it should be safe, but I do not have it on authority.

Hans W
Stings are internally imutable, but the variable can change out from under you, what you should be stating is that the variables that point to String and primatives should be marked 'final'.
fuzzy lollipop
A: 

Do NOT use java.util.Vector, use java.util.Collections.unmodifiableXXX() wrapper if they truly are unmodifiable, this will guarantee they won't change, and will enforce that contract. If they are going to be modified, then use java.util.Collections.syncronizedXXX(). But that only guarantees internal thread safety. Making the variables final will also help the compiler/JIT with optimizations.

fuzzy lollipop
A: 

Amir Afghani is right, since you are only reading from the structures you should not run into any issues. If you need to modify the structure then you should look into the classes provided in the Java concurrency package

instanceofTom
+1  A: 

If the data is never modified after it's created, then you should be fine and reads will be thread safe.

To be on the safe side, you could make all of the data members "final" and make all of the accessing functions reentrant where possible; this ensures thread safety and can help keep your code thread safe if you change it in the future.

In general, making as many members "final" as possible helps reduce the introduction of bugs, so many people advocate this as a Java best practice.

Tom
+1 for being the only person here to mention "final"
Steve B.
my answer talks about final 7 mins before :-)
fuzzy lollipop
Like Amir Afghani, the idea that data can be shared across threads without a memory barrier is wrong and dangerous. But, also like Amir, you have a good idea with `final`.
erickson
But Erickson, I didn't say that its OK to share data across threads without a memory barrier without the qualification that the data is not changing. Please explain how this is dangerous.
Amir Afghani
@Erickson: I too have the same question as Amir. ;-) Why is it dangerous?
Tom
@fuzzy lollipop: I probably type slower than you do. ;-)
Tom
The most informative explanation is the Java Language Specification, Chapter 17 (http://java.sun.com/docs/books/jls/third_edition/html/memory.html). But Brian Goetz's summary (http://www.ibm.com/developerworks/java/library/j-jtp0618.html#1c) may be more to-the-point. Note that in his example, "the data is not changing", but it's still not safe.
erickson
A: 

The members of an ArrayList aren't protected by any memory barriers, so there is no guarantee that changes to them are visible between threads. This applies even when the only "change" that is ever made to the list is its construction.

Any data that is shared between thread needs a "memory barrier" to ensure its visibility. There are several ways to accomplish this.

First, any member that is declared final and initialized in a constructor is visible to any thread after the constructor completes.

Changes to any member that is declared volatile are visible to all threads. In effect, the write is "flushed" from any cache to main memory, where it can be seen by any thread that accesses main memory.

Now it gets a bit trickier. Any writes made by a thread before that thread writes to a volatile variable are also flushed. Likewise, when a thread reads a volatile variable, its cache is cleared, and subsequent reads may repopulate it from main memory.

Finally, a synchronized block is like a volatile read and write, with the added quality of atomicity. When the monitor is acquired, the thread's read cache is cleared. When the monitor is released, all writes are flushed to main memory.

One way to make this work is to have the thread that is populating your shared data structure assign the result to a volatile variable (or an AtomicReference, or other suitable java.util.concurrent object). When other threads access that variable, not only are they guaranteed to get the most recent value for that variable, but also any changes made to the data structure by the thread before it assigned the value to the variable.

erickson
"The members of an ArrayList aren't protected by any memory barriers, so there is no guarantee that changes to them are visible between threads" - The OP explicitly states that the data is NOT changing.
Amir Afghani
@Amir Afghani - constructing the object in the first place is a "change" that needs to be visible to other threads. Here is another Brian Goetz article about safe construction: http://www.ibm.com/developerworks/java/library/j-jtp0618.html
erickson
Fascinating, thanks erickson. That's exactly the kind of gotcha' I was afraid of. My reading of Pattern #2 on this page:http://www.ibm.com/developerworks/java/library/j-jtp06197.htmlconfirms your suggestion to build the tree and then have the read threads access it through a volatile variable is a really good idea.Thanks!
Jon M
Jon M