views:

679

answers:

7

If I have a deeply immutable type (all members are readonly and if they are reference type members, then they also refer to objects that are deeply immutable).

I would like to implement a lazy initialized property on the type, like this:

private ReadOnlyCollection<SomeImmutableType> m_PropName = null;
public ReadOnlyCollection<SomeImmutableType> PropName
{
    get
    {
        if(null == m_PropName)
        {
            ReadOnlyCollection<SomeImmutableType> temp = /* do lazy init */;
            m_PropName = temp;
        }
        return m_PropName;
    }
}

From what I can tell:

m_PropName = temp;

...is threadsafe. I'm not worried too much about two threads both racing to initialize at the same time, because it will be rare, both results would be identical from a logical perspective, and I'd rather not use a lock if I don't have to.

Will this work? What are the pros and cons?

Edit: Thanks for your answers. I will probably move forward with using a lock. However, I'm surprised nobody brought up the possibility of the compiler realizing that the temp variable is unnecessary, and just assigning straight to m_PropName. If that were the case, then a reading thread could possibly read an object that hasn't finished being constructed. Does the compiler prevent such a situation?

(Answers seem to indicate that the runtime won't allow this to happen.)

Edit: So I've decided to go with an Interlocked CompareExchange method inspired by this article by Joe Duffy.

Basically:

private ReadOnlyCollection<SomeImmutableType> m_PropName = null;
public ReadOnlyCollection<SomeImmutableType> PropName
{
    get
    {
        if(null == m_PropName)
        {
            ReadOnlyCollection<SomeImmutableType> temp = /* do lazy init */;
            System.Threading.Interlocked(ref m_PropName, temp, null);
        }
        return m_PropName;
    }
}

This is supposed to ensure that all threads that call this method on this object instance will get a reference to the same object, so the == operator will work. It is possible to have wasted work, which is fine - it just makes this an optimistic algorithm.

As noted in some comments below, this depends on the .NET 2.0 memory model to work. Otherwise, m_PropName should be declared volatile.

+5  A: 

I'd be interested to hear other answers to this, but I don't see a problem with it. The duplicate copy will be abandoned and gets GCed.

You need to make the field volatile though.

Regarding this:

However, I'm surprised nobody brought up the possibility of the compiler realizing that the temp variable is unnecessary, and just assigning straight to m_PropName. If that were the case, then a reading thread could possibly read an object that hasn't finished being constructed. Does the compiler prevent such a situation?

I considered mentioning it but it makes no difference. The new operator doesn't return a reference (and so the assignment to the field doesn't happen) until the constructor completes - this is guaranteed by the runtime, not the compiler.

However, the language/runtime does NOT really guarantee that other threads cannot see a partially constructed object - it depends what the constructor does.

Update:

The OP also wonders whether this page has a helpful idea. Their final code snippet is an instance of Double checked locking which is the classic example of an idea that thousands of people recommmend to each other without any idea of how to do it right. The problem is that SMP machines consist of several CPUs with their own memory caches. If they had to synchronise their caches every time there was a memory update, this would undo the benefits of having several CPUs. So they only synchronize at a "memory barrier", which occurs when a lock is taken out, or an interlocked operation occurs, or a volatile variable is accessed.

The usual order of events is:

  • Coder discovers double-checked locking
  • Coder discovers memory barriers

Between these two events, they release a lot of broken software.

Also, many people believe (as that guy does) that you can "eliminate locking" by using interlocked operations. But at runtime they are a memory barrier and so they cause all CPUs to stop and synchronize their caches. They have an advantage over locks in that they don't need to make a call into the OS kernel (they are "user code" only), but they can kill performance just as much as any synchronization technique.

Summary: threading code looks approximately 1000 x easier to write than it is.

Daniel Earwicker
The duplicate copy has already been passed to a caller as a reference, however, so it cannot be GC'd until that reference is abandoned or cleared.
Eddie
Yes, but the number of duplicates is never larger than the number of threads, and is almost always going to be zero. As long as a small number of duplicates is okay, there's no problem.
Daniel Earwicker
This depends on the consequences. If this introduces a rare race condition into your code, the person doing the debugging will not agree that there is no problem.
Eddie
Any design has the flaw that the implementation may contain bugs. This is certainly more likely to be true of this design than some others. But IF the identity of the returned object is definitely not relied upon, there's no problem.
Daniel Earwicker
Yes, if you can **guarantee* that no-one will ever use == and if you can **guarantee** that the returned object's identify will never be relied upon, then there probably won't be a problem. I hope no team member of mine would ever make such a speculative choice just to avoid synchronization.
Eddie
And if you **do** make the choice to use the OP's kind of coding, you'd better document the heck of out it in the code's internal documentation so future maintainers are aware of reasonable optimizations that can never be used.
Eddie
Please take a look at the extended question that I posed in my Edit of the OP. I wonder if that scenario is possible. Either way, I will use a lock. I can't see a scenario now where I would use the == operator, but I'll opt for consistency over performance.
Scott Whitlock
That's exactly the right choice - Eddie's advice is right about the practical use of this kind of trick. Note that you could overload == so it throws an exception! But that's not as good as a compile time ban, which I don't think is possible.
Daniel Earwicker
I think the code on this page: http://blog.markrendle.net/post/Lazy-initialization-thread-safety-and-my-favourite-operator.aspx using the CompareExchange probably accomplishes what I'm looking for safely and doesn't require every thread get a lock on every read.
Scott Whitlock
So, I'm confused. Are you saying the only right way to do it is with a lock, or are you saying that the Interlocked CompareExchange will work, but is probably as costly as a lock anyway?
Scott Whitlock
The first thing I suggested yesterday - "You need to make the field volatile though" - that will introduce the necessary memory barrier. But it will (as a result) impact performance. Ultimately, it is impossible to totally eliminate all the impact of synchronisation. It MAY be that the corrected...
Daniel Earwicker
... double-checked locking makes a significant difference for you, but it may not - the only way to find out is realistic profiling. Also bear in mind that the CompareExchange solution actually allocates a brand new instance of the data structure for every access, just in case it's the first access!
Daniel Earwicker
I posted a comment on that blog post - see his response.
Daniel Earwicker
"Also bear in mind that the CompareExchange solution actually allocates a brand new instance of the data structure for every access, just in case it's the first access!" Please see the second edit in my original post. I believe this addresses that issue.
Scott Whitlock
Yes it does, but the thing you still haven't put is 'volatile' on the declaration of the field. It's broken without that.
Daniel Earwicker
Is it broken because a reading thread may unnecessarily initialize a new object when it doesn't need to, or is it broken because the method could somehow return a null value? Please explain.
Scott Whitlock
Actually I think my info is out of date, though I'm not entirely sure. Apparently in .NET 2.0 they changed the memory model specifically so that double-checked lock would be okay without volatile. The full explanation is here: http://msdn.microsoft.com/en-gb/magazine/cc163715.aspx
Daniel Earwicker
However, note the strange warning at the end "assume the weakest memory model possible, using volatile declarations instead of relying on implicit guarantees." So is it safe or not? Hard to tell. His advice is generally excellent though: avoid all this tricksy stuff as much as possible.
Daniel Earwicker
Great article. Thank you. And thanks for all the help, I really appreciate it.
Scott Whitlock
No problem, I learned something there too.
Daniel Earwicker
A: 

I am no C# expert, but as far as I can tell, this only poses a problem if you require that only one instance of ReadOnlyCollection is created. You say that the created object will always be the same and it doesn't matter if two (or more) threads do create a new instance, so I would say it is ok to do this without a lock.

One thing that might become a weird bug later would be if one would compare for equality of the instances, which would sometimes not be the same. But if you keep that in mind (or just don't do that) I see no other problems.

Simon Lehmann
I would say that the extra performance is not worth the risk of obscure bugs. Yes, you can say, "Just don't compare with ==" but if the object is marked as being immutable, eventually some maintainer will do just that.
Eddie
+4  A: 

That will work. Writing to references in C# is guaranteed to be atomic, as described in section 5.5 of the spec. This is still probably not a good way to do it, because your code will be more confusing to debug and read in exchange for a probably minor effect on performance.

Jon Skeet has a great page on implementing singeltons in C#.

The general advice about small optimizations like these is not to do them unless a profiler tells you this code is a hotspot. Also, you should be wary of writing code that cannot be fully understood by most programmers without checking the spec.

EDIT: As noted in the comments, even though you say you don't mind if 2 versions of your object get created, that situation is so counter-intuitive that this approach should never be used.

RossFabricant
As you say, it will work (on the surface), but it's a bad idea for so many reasons. It should be discouraged.
Eddie
Actually, the article you quote explicitly says that the code above is a bad way to do things: "I wouldn't use solution 1 because it's broken". Yes, writing to a reference is atomic, but the lazy initialization most definitely IS NOT.
Eddie
Agree with Eddie. This "solution" is broken.
codekaizen
Unfortunately, it will NOT work. It has a race condition in it. Ok, it will work, 99.99% of the time, because the race condition is quite unlikely to occur, but still it won't work 100% of the time (see the link in my answer).
Thomas Danecker
+3  A: 

You should use a lock. Otherwise you risk two instances of m_PropName existing and in use by different threads. This may not be a problem in many instances; however, if you want to be able to use == instead of .equals() then this will be a problem. Rare race conditions are not the better bug to have. They are difficult to debug and to reproduce.

In your code, if two different threads simultaneously get your property PropName (say, on a multi-core CPU), then they can receive different new instances of the property that will contain identical data but not be the same object instance.

One key benefit of immutable objects is that == is equivalent to .equals(), allowing use of the more performant == for comparison. If you don't synchronize in the lazy initialization, then you risk losing this benefit.

You also lose immutability. Your object will be initialized twice with different objects (that contain the same values), so a thread that already got the value of your property, but that gets it again, may receive a different object the second time.

Eddie
A: 

This is definitely a problem.

Consider this scenario: Thread "A" accesses the property, and the collection is initialized. Before it assigns the local instance to the field "m_PropName", Thread "B" accesses the property, except it gets to complete. Thread "B" now has a reference to that instance, which is currently stored in "m_PropName"... until Thread "A" continues, at which point "m_PropName" is overwritten by the local instance in that thread.

There are now a couple of problems. First, Thread "B" doesn't have the correct instance anymore, since the owning object thinks that "m_PropName" is the only instance, yet it leaked out an initialized instance when Thread "B" completed before Thread "A". Another is if the collection changed between when Thread "A" and Thread "B" got their instances. Then you have incorrect data. It could even be worse if you were observing or modifying the read-only collection internally (which, of course, you can't with ReadOnlyCollection, but could if you replaced it with some other implementation which you could observe via events or modify internally but not externally).

codekaizen
I believe you missed the point about the object being immutable, and the returned collection being effectively immutable. The worst case, as some others have mentioned, is that you end up with two or more identical ReadOnlyCollection<> objects with identical objects in their collection.
Scott Whitlock
Er, right. Consider, my last sentence. Couldn't one construe that I did grasp this point, but was trying to drive home the realities of multithreading, which the poster apparently needs to understand? It could be the poster has other code which this line of reasoning will apply to.
codekaizen
A: 

Unfortunately, you need a lock. There are a lot of quite subtle bugs when you do not lock properly. For a daunting example look at this answer.

Thomas Danecker
+1  A: 

I'm all for lazy init when the data may not always be accessed and it can take a good amount of resources to fetch or store the data.

I think there is a key concept being forgotten here: As per the C# design concepts, you should not make your instance members thread-safe by default. Only static members should be made thread-safe by default. Unless you are accessing some static/global data, you should not add extra locks into your code.

From what your code shows, the lazy init is all inside an instance property, so I would not add locks to it. If, by design, it is meant to be accessed by multiple threads simultaneously, then go ahead and add the lock.

By the way, it may not reduce code by much, but I am fan of the null-coalesce operator. The body to your getter could become this instead:

m_PropName = m_PropName ?? new ...();
return m_PropName;


It gets rid of the extra "if (m_PropName == null) ..." and in my opinion makes it more concise and readable.

Erich Mirabal
Thanks for the comment. In this case, the overall data structure is immutable partly because it's going to be used extensively in a multi-threaded situation, so it is important. Also see: http://blog.markrendle.net/post/Lazy-initialization-thread-safety-and-my-favourite-operator.aspx
Scott Whitlock