To all the people worried about a partially constructed object:
As far as I understand, the problem of partially constructed objects is only a problem within constructors. In other words, within a constructor, if an object references itself (including it's subclass) or it's members, then there are possible issues with partial construction. Otherwise, when a constructor returns, the class is fully constructed.
I think you are confusing partial construction with the different problem of how the compiler optimizes the writes. The compiler can choose to A) allocate the memory for the new Helper object, B) write the address to myHelper (the local stack variable), and then C) invoke any constructor initialization. Anytime after point B and before point C, accessing myHelper would be a problem.
It is this compiler optimization of the writes, not partial construction that the cited papers are concerned with. In the original single-check lock solution, optimized writes can allow multiple threads to see the member variable between points B and C. This implementation avoids the write optimization issue by using a local stack variable.
The main scope of the cited papers is to describe the various problems with the double-check lock solution. However, unless the atomicSet method is also synchronizing against the Foo class, this solution is not a double-check lock solution. It is using multiple locks.
I would say this all comes down to the implementation of the atomic assignment function. The function needs to be truly atomic, it needs to guarantee that processor local memory caches are synchronized, and it needs to do all this at a lower cost than simply always synchronizing the getHelper method.
Based on the cited paper, in Java, it is unlikely to meet all these requirements. Also, something that should be very clear from the paper is that Java's memory model changes frequently. It adapts as better understanding of caching, garbage collection, etc. evolve, as well as adapting to changes in the underlying real processor architecture that the VM runs on.
As a rule of thumb, if you optimize your Java code in a way that depends on the underlying implementation, as opposed to the API, you run the risk of having broken code in the next release of the JVM. (Although, sometimes you will have no choice.)
dsimcha:
If your atomicSet method is real, then I would try sending your question to Doug Lea (along with your atomicSet implementation). I have a feeling he's the kind of guy that would answer. I'm guessing that for Java he will tell you that it's cheaper to always synchronize and to look to optimize somewhere else.