views:

1118

answers:

9

This is a complex question, please consider carefully before answering.

Consider this situation. Two threads (a reader and a writer) access a single global int. Is this safe? Normally, I would respond without thought, yes!

However, it seems to me that Herb Sutter doesn't think so. In his articles on effective concurrency he discusses a flawed lock-free queue and the corrected version.

In the end of the first article and the beginning of the second he discusses a rarely considered trait of variables, write ordering. Int's are atomic, good, but ints aren't necessarily ordered which could destroy any lock-free algorithm, including my above scenario. I fully agree that the only way to guarantee correct multithreaded behavior on all platforms present and future is to use atomics(AKA memory barriers) or mutexes.

My question; is write re-odering ever a problem on real hardware? Or is the multithreaded paranoia just being pedantic?
What about classic uniprocessor systems?
What about simpler RISC processors like an embedded power-pc?

Clarification: I'm more interested in what Mr. Sutter said about the hardware (processor/cache) reordering variable writes. I can stop the optimizer from breaking code with compiler switches or hand inspection of the assembly post-compilation. However, I'd like to know if the hardware can still mess up the code in practice.

+5  A: 

Like you said, because of reordering done at cache or processor level, you actually do need some sort of memory barrier to ensure proper synchronisation, especially for multi-processors (and especially on non-x86 platforms). (I am given to believe that single-processor systems don't have these issues, but don't quote me on this---I'm certainly more inclined to play safe and do the synchronised access anyway.)

Chris Jester-Young
The issues do even already exist on single processor systems. For example, PowerPC 60x based cores are perfectly capable of re-ordering IO due to having multiple execution units in each core. These is explicitly why the EIEIO, SYNC, and ISYNC instructions are needed.
Tall Jeff
Yeah, even if the code worked on a single-processor, it'd be fragile code design that starts failing mysteriously when you upgrade to a multi-processor machine.
Bill Karwin
Or even a smarter single-processor. With multithreading, anything that is not explicitly guaranteed can be expected to change mysteriously at some point in the future.
Eclipse
Yes I don't see why this has to be constrained to multi-core single proc machines. The whole point is that on a single core single proc machine, instructions can be re-ordered at the microcode level.
ApplePieIsGood
+8  A: 

Yup - use memory barriers to prevent instruction reordering where needed. In some C++ compilers, the volatile keyword has been expanded to insert implicit memory barriers for every read and write - but this isn't a portable solution. (Likewise with the Interlocked* win32 APIs). Vista even adds some new finer-grained Interlocked APIs which let you specify read or write semantics.

Unfortunately, C++ has such a loose memory model that any kind of code like this is going to be non-portable to some extent and you'll have to write different versions for different platforms.

Eclipse
FWIW, C++0x will introduce a portable mechanism for writing threaded, thread-safe code (inspired by the boost.thread library).
Shog9
Hallelujah! Of course it will be another 5-10 years before C++0x code can be considered portable...
Eclipse
But will the c++0x features address the memory model issue? This is not a threading issue per say, so I fail to see what these new features (or the existing ones in boost) have to offer here. This is an issue with instruction order.
ApplePieIsGood
Yup - C++0x introduces a more well-defined memory model, as well as atomic<> types that are explicitly ok to modify without using locks. See http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2427.html and http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2429.htm
Eclipse
+21  A: 

Your idea of inspecting the assembly is not good enough; the reordering can happen at the hardware level.

To answer your question "is this ever a problem on read hardware:" Yes! In fact I've run into that problem myself.

Is it OK to skirt the issue with uniprocessor systems or other special-case situations? I would argue "no" because five years from now you might need to run on multi-core after all, and then finding all these locations will be tricky (impossible?).

One exception: Software designed for embedded hardware applications where indeed you have completely control over the hardware. In fact I have "cheated" like this in those situations on e.g. an ARM processor.

Jason Cohen
+3  A: 

It is a problem on real hardware. A friend of mine works for IBM and makes his living primarily by sussing out this kind of problem in customers' codes.

If you want to see how bad things can get, search for academic papers on the Java Memory Model (and also now the C++ memory model). Given the reordering that real hardware can do, trying to figure out what's safe in a high-level language is a nightmare.

Norman Ramsey
+4  A: 

is this ever a problem on real hardware?

Absolutely, particularly now with the move to multiple cores for current and future CPUs. If you're dependent on ordered atomicity to implement features in your application and you are unable to guarantee this requirement via your chosen platform or the use of synchronization primitives, under all conditions i.e. customer moves from a single-core CPU to multi-core CPU, then you are just waiting for a problem to occur.

Quoting from the referred to Herb Sutter article (second one)

Ordered atomic variables are spelled in different ways on popular platforms and environments. For example:

  • volatile in C#/.NET, as in volatile int.
  • volatile or * Atomic* in Java, as in volatile int, AtomicInteger.
  • atomic<T> in C++0x, the forthcoming ISO C++ Standard, as in atomic<int>.

I have not seen how C++0x implements ordered atomicity so I'm unable to specify whether the upcoming language feature is a pure library implementation or relies on changes to the language also. You could review the proposal to see if it can be incorporated as a non-standard extension to your current tool chain until the new standard is available, it may even be available already for your situation.

Henk
+4  A: 

We have run into the problem, albeit on Itanium processors where the instruction reordering is more aggressive than x86/x64.

The fix was to use an Interlocked instruction since there was (at the time) no way of telling the compiler to simply but a write barrier after the assignment.

We really need language extension to deal with this cleanly. Use of volatile (if supported by the compiler) is too coarse grained for the cases where you are trying to squeeze as much performance out of a piece of code as possible.

Rob Walker
A: 

The answer to the question" is it safe" is inherently ambiguous.

It's always safe, even for doubles, in the sense that your computer won't catch fire. It's safe, in the sense that you always will get a value that the int held at some time in the past, It's not safe, in the sense that you may get a value which is/will be updated by another thread.

"Atomic" means that you get the second guarantee. Since double usually is not atomic, you could get 32 old and 32 new bits. That's clearly unsafe.

MSalters
+1  A: 

When I asked the question I most interested in uniprocessor powerpc. In one of the comments InSciTek Jeff mentioned the powerpc SYNC and ISYNC instructions. Those where the key to a definitive answer. I found it here on IBM's site.

The article is large and pretty dense, but the take away is No, it is not safe. On older powerpc's the memory optimizers where not sophisticated enough to cause problems on a uniprocessor. However, the newer ones are much more aggressive, and can break even simple access to a global int.

caspin
+2  A: 

No this isn't safe and there is real hardware avaialble that exhibits this problem, for example the memory model in the powerpc chip on xbox 360 allows writes to be reordered. This is exacerbated by the lack of barriers in the intrinsics, see this article on msdn for more details.

Rick