views:

211

answers:

7

Hi!

A quick question I've been wondering about for some time; Does the CPU assign values atomically, or, is it bit by bit (say for example a 32bit integer).
If it's bit by bit, could another thread accessing this exact location get a "part" of the to-be-assigned value?

Think of this:
I have two threads and one shared "unsigned int" variable (call it "g_uiVal").
Both threads loop.
On is printing "g_uiVal" with printf("%u\n", g_uiVal).
The second just increase this number.
Will the printing thread ever print something that is totally not or part of "g_uiVal"'s value?

In code:

unsigned int g_uiVal;

void thread_writer()
{
 g_uiVal++;
}
void thread_reader()
{
 while(1)
  printf("%u\n", g_uiVal);
}
+3  A: 

I believe the only correct answer is "it depends". On what you may ask?

Well for starters which CPU. But also some CPUs are atomic for writing word width values, but only when aligned. It really is not something you can guarantee at a C language level.

Many compilers offer "intrinsics" to emit correct atomic operations. These are extensions which act like functions, but emit the correct code for your target architecture to get the needed atomic operations. For example: http://gcc.gnu.org/onlinedocs/gcc/Atomic-Builtins.html

Evan Teran
+5  A: 

Depends on the bus widths of the CPU and memory. In a PC context, with anything other than a really ancient CPU, accesses of up to 32 bit accesses are atomic; 64-bit accesses may or may not be. In the embedded space, many (most?) CPUs are 32 bits wide and there is no provision for anything wider, so your int64_t is guaranteed to be non-atomic.

crazyscot
Is it possible to access 32-bit values that cross cache-lines these days?
Lasse V. Karlsen
@Lasse: Many modern desktop processors allow unaligned reads and writes, but at a significant performance penalty. Older or smaller CPUs (for e.g. embedded devices) tend not to. For some time this was divided between CISC processors (tended to support unaligned reads and writes) and RISC processors (didn't), but the distinctions are blurring here.
leander
8, 16, and 32 bit microcontrollers are all common. On an AVR (with 8 bit loads and stores) it is possible to have another an ISR (interrupt service routine) or another thread (if you are running a preemptable multitasking operating system) write part of the variable (or just read part of the change that the previous thread made).
nategoose
@crazyscot: Most embedded processors are 8 or 16 bit, and there are many 4 bit processors still in use. With those, 32 bit operation is not atomic. On the other hand, there are 64 bit processors, too. I am currently working with TI 64x series DSP, which is considered to be a 32 bit processor, but it can access internal data memory through a 64 bit data bus (actually 2 x 64 bit buses), and 64 bit (and maybe even 128 bit) operations are atomic.
PauliL
A: 

To add to what has been said so far - another potential concern is caching. CPUs tend to work with the local (on die) memory cache which may or may not be immediately flushed back to the main memory. If the box has more than one CPU, it is possible that another CPU will not see the changes for some time after the modifying CPU made them - unless there is some synchronization command informing all CPUs that they should synchronize their on-die caches. As you can imagine such synchronization can considerably slow the processing down.

mfeingold
But in this case, since there's no synchronization between consumer and producer it doesn't really change the behavior. Sure, the consumer could read an old value, but it wouldn't be possible to tell if it was due to non synchronized on-die caches or just scheduling. What I'm getting at is that the consumer would never read a partially written value due to non synchronized caches
Isak Savo
It all depends on what's expected. If the intention is to produce unique values - well, this problem can introduce duplicates
mfeingold
A: 

Don't forget that the compiler assumes single-thread when optimizing, and this whole thing could just go away.

DeadMG
even though it's a global variable that's clearly in use in other functions? I doubt any compiler would be *that* rude :)
Isak Savo
@Isak Savo: static (non-extern) non-volatile global variable? sure, why not? Mark all variables used for concurrency control as volatile, this prevents compiler optimizations related to these variables.
liori
@liori: But even in a single threaded app, that is perfectly valid code. Bad architecture aside, it's nothing wrong with having a function modify global variables without using the result themselves.
Isak Savo
@Isak Savo: I am not saying that this is invalid code. What I am only saying is that as long as the code behaves in the same way as specified, compiler can do anything... like removing useless global variables.
liori
@liori: Fully agree. But in this case the variable is clearly used. It's sent as an argument to printf and for all the compiler knows, that printf() could be preventing the end of the world.
Isak Savo
@Isak Savo: the compiler knows that the thread in which the `thread_reader()` function is called does not change that variable, and the variable is not volatile. Therefore it can assume that its value will not change in the loop, and load variable's value to CPU register once before the loop. AFAIK gcc will do so with -O3.
liori
(I cannot reproduce that now... but I'm sure gcc used to do that)
liori
@liori: So this code will be broken by GCC? `int myVar = 0; void f1() { myVar = 3; } void f2() { printf ("%d", myVar); } int main() { f1(); f2(); return 0;}`. I'd expect that code to print 3 on the screen.
Isak Savo
@Isak Savo: as long as everything is in one thread, it will work as you expect.
liori
Isak, if you never call it in the intervening period, the compiler may well just push the value on to the stack and leave it there. During your loop, the other function is never called, and thus, the compiler is perfectly allowed to optimize it out. It can't know your threading model.As long as your code doesn't break in single thread, the compiler's allowed to do it. It's not the compiler's job to let multi-thread work.
DeadMG
@DeadMG: Ah, I now see what you and liori means. I was stuck in a more general case in my head, but looking at the OP's code again I understand how the compiler can optimize it away since it won't change during the loop (as far as the compiler cares anyway). Thanks for explaining
Isak Savo
A: 

You said "bit-by-bit" in your question. I don't think any architecture does operations a bit at a time, except with some specialized serial protocol busses. Standard memory read/writes are done with 8, 16, 32, or 64 bits of granularity. So it is POSSIBLE the operation in your example is atomic.

However, the answer is heavily platform dependent.

  • It depends on the CPU's capabilities. Can the hardware do an atomic 32-bit operation? Here's a hint: If the variable you are working on is larger than the native register size (e.g. 64-bit int on a 32-bit system), it's definitely NOT atomic.
  • It depends on how the compiler generates the machine code. It could have turned your 32-bit variable access into 4x 8-bit memory reads.
  • It gets tricky if the address of what you are accessing is not aligned across a machine's natural word boundary. You can hit a a cache fault or page fault.

It is VERY POSSIBLE that you would see a corrupt or unexpected value using the code example that you posted.

Your platform probably provides some method of doing atomic operations. In the case of a Windows platform, it is via the Interlocked functions. In the case of Linux/Unix, look at the atomic_t type.

msemack
A: 

POSIX defines the special type sig_atomic_t which guarentees that writes to it are atomic with respect to signals, which will make it also atomic from the point of view of other threads like you want. They don't specifically define an atomic cross-thread type like this, since thread communication is expected to be mediated by mutexes or other sychronization primitives.

Chris Dodd
A: 

Considering modern microprocessors (and ignoring microcontrollers), the 32-bit assignment is atomic, not bit-by-bit.

However, now completely off of your question's topic... the printing thread could still print something that is not expected because of the lack of synchronization in this example, of course, due to instruction reordering and multiple cores each with their own copy of g_uiVal in their caches.

Chris O