views:

1258

answers:

12

Suppose that I have an integer variable in a class, and this variable may be concurrently modified by other threads. Writes are protected by a mutex. Do I need to protect reads too? I've heard that there are some hardware architectures on which, if one thread modifies a variable, and another thread reads it, then the read result will be garbage; in this case I do need to protect reads. I've never seen such architectures though.

This question assumes that a single transaction only consists of updating a single integer variable so I'm not worried about the states of any other variables that might also be involved in a transaction.

+1  A: 

If you don't use prevous value of this variable when write new, then:

  You can read and write integer variable without using mutex. It is because integer is base type in 32bit architecture and every modification/read of value is doing with one operation.

But, if you donig something such as increment:

myvar++;

  Then you need use mutex, because this construction is expanded to myvar = myvar + 1 and between read myvar and increment myvar, myvar can be modified. In that case you will get bad value.

Vitaly Dyatlov
+3  A: 

Imagine that you're reading the variable in one thread, that thread gets interrupted while reading and the variable is changed by a writing thread. Now what is the value of the read integer after the reading thread resumes?

Unless reading a variable is an atomic operation, in this case only takes a single (assembly) instruction, you can not ensure that the above situation can not happen. (The variable could be written to memory, and retrieving the value would take more than one instruction)

The consensus is that you should encapsulate/lock all writes individualy, while reads can be executed concurrently with (only) other reads

NomeN
Read does not need to take one clock cycle to be atomic. Your last part is unclear to me - I guess you mean "reads can be domne concurrently, but durign a write, other threads reading or writing must be locked"
peterchen
A: 

This may happen on 8 bit systems which use 16 bit integers.

If you want to avoid locking you can under suitable circumstances get away with reading multiple times, until you get two equal consecutive values. For example, I've used this approach to read the 64 bit clock on a 32 bit embedded target, where the clock tick was implemented as an interrupt routine. In that case reading three times suffices, because the clock can only tick once in the short time the reading routine runs.

starblue
A: 

If a variable is marked with the volatile keyword then the read/write becomes atomic but this has many, many other implications in terms of what the compiler does and how it behaves and shouldn't just be used for this purpose.

Read up on what volatile does before you blindly start using it: http://msdn.microsoft.com/en-us/library/12a04hfd(VS.80).aspx

Jeff Tucker
volatile doesn't mean "atomic". It's compiler dependant what volatile really means, today most compilers treat it as "don't do clever optimization tricks to this variable"
nos
The standard specifies 'volatile' as meaning that the variable may change or be accessed out of the thread's control, so it should never be cached in a register or similar. A variable marked volatile will always be read and written directly to main memory. However, it says nothing about synchronization.
jalf
@jalf: I believe a variable marked "volatile" must be read exactly once any time the C code indicates a read, even if it would be more efficient to read the variable zero times, or more times; and written exactly once any time the code indicates a write, even if it might be more efficient to write it more (e.g. a=!b; might be most efficient as "a=0; if (!b) a=1;" but that would not be permitted if a is volatile. Is that still the current meaning?
supercat
A: 

Both reading / writing to variables with concurrency must be protected by a critical section (not mutex). Unless you want to waste your whole day debugging.

Critical sections are platform-specific, I believe. On Win32, critical section is very efficient: when no interlocking occur, entering critical section is almost free and does not affect overall performance. When interlocking occur, it is still more efficient, than mutex, because it implements a series of checks before suspending the thread.

SadSido
A: 

Depends on your platform. Most modern platforms offer atomic operations for integers: Windows has InterlockedIncrement, InterlockedDecrement, InterlockedCompareExchange, etc. These operations are usually supported by the underlying hardware (read: the CPU) and they are usually cheaper than using a critical section or other synchronization mechanisms.

See MSDN: InterlockedCompareExchange

I believe Linux (and modern Unix variants) support similar operations in the pthreads package but I don't claim to be an expert there.

LaszloG
this does not answer the OP's question
NomeN
+6  A: 

You ask a question about reading a variable and later you talk about updating a variable, which implies a read-modify-write operation.

Assuming you really mean the former, the read is safe if it is an atomic operation. For almost all architectures this is true for integers.

There are a few (and rare) exceptions:

  • The read is misaligned, for example accessing a 4-byte int at an odd address. Usually you need to force the compiler with special attributes to do some misalignment.
  • The size of an int is bigger than the natural size of instructions, for example using 16 bit ints on a 8 bit architecture.
  • Some architectures have an artificially limited bus width. I only know of very old and outdated ones, like a 386sx or a 68008.
drhirsch
Note that even if the read is atomic, it may still be a stale value from the current threads CPU's cache.
peterchen
No. Read about the MESI protcol. Cache coherency for normal read/write operations is always guaranteed and transparent to the software.
drhirsch
@drhirsch: Are you absolutely certain that no CPU exists that requires explicit cache control in the machine code? Because I am pretty sure I have read about such.
Zan Lynx
@drhirsch: Ahah! "Not so MIPS systems, where many MIPS cores have caches with no extra "coherence" hardware of any kind." - http://www.embedded.com/design/opensource/208800056?_requestid=187872
Zan Lynx
This arcticle talkes about I/O caches in single processor systems regarding to DMA access. This has nothing to do with CPU cache coherency in multiprocessor systems. Do you understand the difference?
drhirsch
And yes, I am absolutly sure that there is no multiprocessor system without cpu cache coherency. The simple reason is, you can't do cache coherency in software without completly negating the effect of the cpu cache, because you would need to do some bookkeeping of the cpu cache content addresses in an uncached area of the memory.
drhirsch
The next part 6 does talk about CPU caches. At any rate, I happen to know there are systems with multiple CPUs without hardware cache coherence. Perhaps you have never seen one, but they do exist. You *will* get into serious bugs if the correct hardware instructions are not used with shared data. Another quote from Part 6 of that series I linked: "But once you get away from the systems built round a single common bus, it's hard to make sure even that reads and writes stay in the same relative order."
Zan Lynx
And now I realize we may be talking about different things when we say "hardware cache coherence." I am not saying that the hardware does not sync the caches and memory. I am saying that this is not done without being asked to do it. It is not always done automatically.
Zan Lynx
Careful. CPU cache coherency is __not__ about syncing memory and cache, this is an absolutly basic functionality which has to be builtin into __any__ cache. It is about syncing different caches on different cpus. And if you follow part 6 of the article, it talks about MESI and other things I mentioned before, which are used to achive this synchronization. In large systems it may be benefical to avoid the overhead of bus snooping and so use uncachable shared memeory areas and/or local memory. But in any case you won't get "a different value from the other CPU cache".
drhirsch
In the first case you don't have a cache at all for the memory operation in question, in the second case the program described in the original question would be ill-formed - of course you can't read the same data from different local memorys.A more precise formulation: If you use shared memory in a multi cpu environment, you will have hardware to __automatically__ sync the CPU caches, or you won't have any caches at all.
drhirsch
+2  A: 

Suppose that I have an integer variable in a class, and this variable may be concurrently modified by other threads. Writes are protected by a mutex. Do I need to protect reads too? I've heard that there are some hardware architectures on which, if one thread modifies a variable, and another thread reads it, then the read result will be garbage; in this case I do need to protect reads. I've never seen such architectures though.

In the general case, that is potentially every architecture. Every architecture has cases where reading concurrently with a write will result in garbage. However, almost every architecture also has exceptions to this rule.

It is common that word-sized variables are read and written atomically, so synchronization is not needed when reading or writing. The proper value will be written atomically as a single operation, and threads will read the current value as a single atomic operation as well, even if another thread is writing. So for integers, you're safe on most architectures. Some will extend this guarantee to a few other sizes as well, but that's obviously hardware-dependant.

For non-word-sized variables both reading and writing will typically be non-atomic, and will have to be synchronized by other means.

jalf
A: 

In general, each machine instruction goes thru several hardware stages when executing. As most current CPUs are multi-core or hyper-threaded, that means that reading a variable may start it moving thru the instruction pipeline, but it doesn't stop another CPU core or hyper-thread from concurrently executing a store instruction to the same address. The two concurrently executing instructions, read and store, might "cross paths", meaning that the read will receive the old value just before the new value is stored.
To resume: you do need the mutex for both read and writes.

harrymc
A: 

While it would probably be safe to read ints on 32 bit systems without synchronization. I would not risk it. While multiple concurrent reads are not a problem, I do not like writes to happen at the same time as reads.

I would recommend placing the reads in the Critical Section too and then stress test your application on multiple cores to see if this is causing too much contention. Finding concurrency bugs is a nightmare I prefer to avoid. What happens if in the future some one decides to change the int to a long long or a double, so they can hold larger numbers?

If you have a nice thread library like boost.thread or zthread then you should have read/writer locks. These would be ideal for your situation as they allow multiple reads while protecting writes.

iain
+15  A: 

atomic read
As said before, it's platform dependent. On x86, the value must be aligned on a 4 byte boundary. Generally for most platforms, the read must execute in a single CPU instruction.

optimizer caching
The optimizer doesn't know you are reading a value modified by a different thread. declaring the value volatile helps with that: the optimizer will issue a memory read / write for every access, instead of trying to keep the value cached in a register.

CPU cache
Still, you might read a stale value, since on modern architectures you have multiple cores with individual cache that is not kept in sync automatically. You need a read memory barrier, usually a platform-specific instruction.

On Wintel, thread synchronization functions will automatically add a full memory barrier, or you can use the InterlockedXxxx functions.

MSDN: Memory and Synchronization issues, MemoryBarrier Macro

[edit] please also see drhirsch's comments.

peterchen
+1 because of memory barrier/CPU Cache mention as well as optimizer caching, which no one else here seems to acknowledge.
paercebal
Not really the answer I'm looking for, but you seem to have the best answer.
Hongli
-1 for being completly wrong on the CPU cache issue. Google up MESI protocol. Memory barriers are for weak ordering load/store instructions (on x86 usually streaming mmx), which is a completly different topic.
drhirsch
drhirsch - (1) do all platforms guarantee cache consistency? (2) Isn't the net effect of reordering of instructions the same?
peterchen
(1) Yes. There is no multiprocessor architecture without cache coherency, simply because it is required. There are other protocols than MESI, but this is the most widespread. (2) You seem to confuse reordering of instructions, reordering of memory operations and concurrent access to the same data. This is all different. On most architectures instructions and the corresponding memory operations are not coupled, the processor may reorder memory operations within certain limits for effiency (out of order architecture). One of the rare counterexamples is the Intel Atom.
drhirsch
But this usually doesn't concern the programmer, unless you use instructions which have "weak memory ordering". You need assembler instructions or intrinsics to use these, so if you are using a high level language you will never have to use memory barriers. So far this has absolutly nothing to do with multiprocessors and their cache. If now two processors access the same data, which resides in one of the processors caches, the hardware makes sure, that both of the processors get the very same data, which may be even transferred from one processors cache to the other.
drhirsch
The "reordering of instructions" isn't really relevant in that case, this is a local optimization a processor can do. If you don't use synchronization and have a write operation in one of your processors the result is simply undefined. - The MSDN is really not the right place to look up lowlevel instructions, better check Intels or AMDs manuals or whatever you are using.
drhirsch
+5  A: 

I'd recommend not to rely on any compiler or architecture in this case.
Whenever you have a mix of readers and writers (as opposed to just readers or just writers) you'd better sync them all. Imagine your code running an artificial heart of someone, you don't really want it to read wrong values, and surely you don't want a power plant in your city go 'boooom' because someone decided not to use that mutex. Save yourself a night-sleep in a long run, sync 'em.
If you only have one thread reading -- you're good to go with just that one mutex, however if you're planning for multiple readers and multiple writers you'd need a sophisticated piece of code to sync that. A nice implementation of read/write lock that would also be 'fair' is yet to be seen by me.

Dmitry