views:

3968

answers:

15

I've heard that i++ isn't a thread-safe statement since in assembly it reduces down to storing the original value as a temp somewhere, incrementing it, and then replacing it, which could be interrupted by a context switch.

However, I'm wondering about ++i. As far as I can tell, this would reduce to a single assembly instruction, such as 'add r1, r1, 1' and since it's only one instruction, it'd be uninterruptable by a context switch.

Can anyone clarify? I'm assuming that an x86 platform is being used.

A: 

You say "it's only one instruction, it'd be uninterruptible by a context switch." - that's all well and good for a single CPU, but what about a dual core CPU? Then you can really have two threads accessing the same variable at the same time without any context switches.

Without knowing the language, the answer is to test the heck out of it.

Chris
You don't find out if something is thread-safe by testing it - threading issues can be one in a million occurrences. You look it up in your documentation. If it's not guaranteed threadsafe by your documentation, it's not.
Eclipse
Agree with @Josh here. Something is only thread-safe if it can be proven mathematically through an analysis of the underlying code. No amount of testing can begin to approach that.
Rex M
It was a great answer until that last sentence.
Rob K
+80  A: 

You've heard wrong. It may well be that "i++" is thread-safe for a specific compiler and specific processor architecture but it's not mandated in the standards at all. In fact, since multi-threading isn't part of the ISO C or C++ standards (yet), you can't consider anything to be thread-safe based on what you think it will compile down to.

It's quite feasible that ++i could compile to an arbitrary sequence such as:

load r0,[i]  ; load memory into reg 0
incr r0      ; increment reg 0
stor [i],r0  ; store reg 0 back to memory

which would not be thread-safe on my (imaginary) CPU that has no memory-increment instructions. Or it may be smart and compile it into:

lock         ; disable task switching (interrupts)
load r0,[i]  ; load memory into reg 0
incr r0      ; increment reg 0
stor [i],r0  ; store reg 0 back to memory
unlock       ; enable task switching (interrupts)

where lock disables and unlock enables interrupts. But, even then, this may not be thread-safe in an architecture that has more than one of these CPUs sharing memory (the lock may only disable interrupts for one CPU).

The language itself (or libraries for it, if it's not built into the language) will provide thread-safe constructs and you should use those rather than depend on your understanding (or possibly misunderstanding) of what machine code will be generated.

Things like Java synchronized and pthread_mutex_lock() (available to C under some operating systems) are what you need to look into.

paxdiablo
+1 for emphasizing that this is not a platform-specific issue, not to mention a clear answer...
RBerteig
congratz for your C silver badge :)
Johannes Schaub - litb
I think you should precise that no modern OS authorizes user-mode programs to turn interrupts off, and pthread_mutex_lock() isn't part of C.
Bastien Léonard
@Bastien, no modern OS would be running on a CPU that didn't have a memory increment instruction :-) But your point's taken about C.
paxdiablo
I'm no expert, but even with memory increment I think you should worry about what happens if two processors increment at the same time. I think that on x86 you'd have to use a LOCK prefix (or call an atomic function).
Bastien Léonard
In any case, whenever two (or more) thread read and write the same piece of data, using atomic operations or synchronization primitives is a basic step to take.
RaphaelSP
@Bastien: Bull. RISC processors generally don't have memory increment instructions. The load/add/stor tripplet is how you do that on e.g., PowerPC.
derobert
+9  A: 

If you are sharing even an int across threads in a multi-core environment, you need proper memory barriers in place. This can mean using interlocked instructions (see InterlockedIncrement in win32 for example), or using a language (or compiler) that makes certain thread-safe guarantees. With CPU level instruction-reordering and caches and other issues, unless you have those guarantees, don't assume anything shared across threads is safe.

Edit: One thing you can assume with most architectures is that if you are dealing with properly aligned single words, you won't end up with a single word containing a combination of two values that were mashed together. If two writes happen over top of each other, one will win, and the other will be discarded. If you are careful, you can take advantage of this, and see that either ++i or i++ are thread-safe in the single writer/multiple reader situation.

Eclipse
Factually wrong in environments where int access (read/write) is atomic. There are algorithms that can work in such environments, even though the lack of memory barriers may mean you're sometimes working on stale data.
MSalters
I'm just saying that atomicity doesn't guarantee thread-safeness. If you're smart enough to design lock-free datastructures or algorithms, then go ahead. But you still need to know what the guarantees your compiler is going to give you are.
Eclipse
+25  A: 

You can't make a blanket statement about either ++i or i++. Why? Consider incrementing a 64-bit integer on a 32-bit system. Unless the underlying machine has a quad word "load, increment, store" instruction, incrementing that value is going to require multiple instructions, any of which can be interrupted by a thread context switch.

In addition, ++i isn't always "add one to the value." In a language like C, incrementing a pointer actually adds the size of the thing pointed to. That is, if i is a pointer to a 32-byte structure, ++i adds 32 bytes. Whereas almost all platforms have an "increment value at memory address" instruction that is atomic, not all have an atomic "add arbitrary value to value at memory address" instruction.

Jim Mischel
Of course if you don't limit yourself to boring 32-bit integers, in a language like C++, ++i can really be a call to a webservice that updates a value in a database.
Eclipse
+2  A: 

Never assume that an increment will compile down to an atomic operation. Use InterlockedIncrement or whatever similar functions exist on your target platform.

Edit: I just looked up this specific question and increment on X86 is atomic on single processor systems, but not on multiprocessor systems. Using the lock prefix can make it atomic, but it's much more portable just to use InterlockedIncrement.

Dan Olson
InterlockedIncrement() is a Windows function; all my Linux boxes and modern OS X machines are x64 based, so saying InterlockedIncrement() is 'much more portable' than x86 code is rather spurious.
Pete Kirkham
It's much more portable in the same sense that C is much more portable than assembly. The goal here is to insulate yourself from relying on specific generated assembly for a specific processor. If other operating systems are your concern then InterlockedIncrement is easily wrapped.
Dan Olson
+6  A: 

If you want an atomic increment in C++ you can use C++0x libraries (the std::atomic datatype) or something like TBB.

There was once a time that the GNU coding guidelines said updating datatypes that fit in one word was "usually safe" but that advice is wrong for SMP machines, wrong for some architectures, and wrong when using an optimizing compiler.


To clarify the "updating one-word datatype" comment:

It is possible for two CPUs on an SMP machine to write to the same memory location in the same cycle, and then try to propagate the change to the other CPUs and the cache. Even if only one word of data is being written so the writes only take one cycle to complete, they also happen simultaneously so you cannot guarantee which write succeeds. You won't get partially updated data, but one write will disappear because there is no other way to handle this case.

Compare-and-swap properly coordinates between multiple CPUs, but there is no reason to believe that every variable assignment of one-word datatypes will use compare-and-swap.

And while an optimizing compiler doesn't affect how a load/store is compiled, it can change when the load/store happens, causing serious trouble if you expect your reads and writes to happen in the same order they appear in the source code (the most famous being double-checked locking does not work in vanilla C++).

NOTE My original answer also said that Intel 64 bit architecture was broken in dealing with 64 bit data. That is not true, so I edited the answer, but my edit claimed PowerPC chips were broken. That is true when reading immediate values (i.e., constants) into registers (see the two sections named "Loading pointers" under listing 2 and listing 4) . But there is an instruction for loading data from memory in one cycle (lmw), so I've removed that part of my answer.

Max Lybbert
Reads and writes are atomic on most modern CPUs if your data is naturally aligned and the correct size, even with SMP and optimizing compilers. There are a lot of caveats though, especially with 64 bit machines, so it can be cumbersome to ensure your data meets the requirements on every machine.
Dan Olson
Thanks for updating. Correct, reads and writes are atomic as you state they can't be half-completed, but your comment highlights how we approach this fact in practice. Same with memory barriers, they don't affect the atomic nature of the operation, but how we approach it in practice.
Dan Olson
+3  A: 

On x86/Windows in C/C++, you should not assume it is thread-safe. You should use InterlockedIncrement() and InterlockedDecrement() if you require atomic operations.

jeffamaphone
+7  A: 

They are both thread-unsafe.

A CPU cannot do math directly with memory. It does that indirectly by loading the value from memory and doing the math with CPU registers.

i++

register int a1, a2;

a1 = *(&i) ; // One cpu instruction: LOAD from memory location identified by i;
a2 = a1;
a1 += 1; 
*(&i) = a1; 
return a2; // 4 cpu instructions

++i

register int a1;

a1 = *(&i) ; 
a1 += 1; 
*(&i) = a1; 
return a1; // 3 cpu instructions

For both cases, there is a race condition that results in the unpreditable i value.

For example, let's assume there are two concurrent ++i threads with each using register a1, b1 respectively. And, with context switching executed like the following:

register int a1, b1;

a1 = *(&i);
a1 += 1;
b1 = *(&i);
b1 += 1;
*(&i) = a1;
*(&i) = b1;

In result, i doesn't become i+2, it becomes i+1, which is incorrect.

To remedy this, moden CPUs provide some kind of LOCK, UNLOCK cpu instructions during the interval a context switching is disabled.

On Win32, use InterlockedIncrement() to do i++ for thread-safety. It's much faster than relying on mutex.

yogman
+2  A: 

If your programming language says nothing about threads, yet runs on a multithreaded platform, how can any language construct be thread-safe?

As others pointed out: you need to protect any multithreaded access to variables by platform specific calls.

There are libraries out there that abstract away the platform specificity, and the upcoming C++ standard has adapted it's memory model to cope with threads (and thus can guarantee thread-safety).

xtofl
+1  A: 

The 1998 C++ standard has nothing to say about threads, although the next standard (due this year or the next) does. Therefore, you can't say anything intelligent about thread-safety of operations without referring to the implementation. It's not just the processor being used, but the combination of the compiler, the OS, and the thread model.

In the absence of documentation to the contrary, I wouldn't assume that any action is thread-safe, particularly with multi-core processors (or multi-processor systems). Nor would I trust tests, as thread synchronization problems are likely to come up only by accident.

Nothing is thread-safe unless you have documentation that says it is for the particular system you're using.

David Thornley
A: 

Are we assuming i is either global or static. I read, and re-read the post, and no-where it is mentioned that it is global or static. If it is local, i++ and ++i will be thread safe.

Alphaneo
I think it's assumed that the variable will be accessible to more than one thread, otherwise it wouldn't be an issue. :)
Ryan Emerle
A: 

I think that if the expression "i++" is the only in a statement, it's equivalent to "++i", the compiler is smart enough to not keep a temporal value, etc. So if you can use them interchangeably (otherwise you won't be asking which one to use), it doesn't matter whichever you use as they're almost the same (except for aesthetics).

Anyway, even if the increment operator is atomic, that doesn't guarantee that the rest of the computation will be consistent if you don't use the correct locks.

If you want to experiment by yourself, write a program where N threads increment concurrently a shared variable M times each... if the value is less than N*M, then some increment was overwritten. Try it with both preincrement and postincrement and tell us ;-)

fortran
A: 

For a counter, I recommend a using the compare and swap idiom which is both non locking and thread-safe.

Here it is in Java:

public class IntCompareAndSwap {
    private int value = 0;

    public synchronized int get(){return value;}

    public synchronized int compareAndSwap(int p_expectedValue, int p_newValue){
     int oldValue = value;

     if (oldValue == p_expectedValue)
      value = p_newValue;

     return oldValue;
    }
}

public class IntCASCounter {

    public IntCASCounter(){
     m_value = new IntCompareAndSwap();
    }

    private IntCompareAndSwap m_value;

    public int getValue(){return m_value.get();}

    public void increment(){
     int temp;
     do {
      temp = m_value.get();
     } while (temp != m_value.compareAndSwap(temp, temp + 1));

    }

    public void decrement(){
     int temp;
     do {
      temp = m_value.get();
     } while (temp > 0 && temp != m_value.compareAndSwap(temp, temp - 1));

    }
}
AtariPete
Seems similar to a test_and_set function.
samoz
You wrote "non locking", but doesn't "synchronized" mean locking?
Corey Trager
A: 

Throw i into thread local storage; it isn't atomic, but it then doesn't matter.

A: 

Even if it is reduced to a single assembly instruction, incrementing the value directly in memory, it is still not thread safe.

When incrementing a value in memory, the hardware does a "read-modify-write" operation: it reads the value from the memory, increments it, and writes it back to memory. The x86 hardware has no way of incrementing directly on the memory; the RAM (and the caches) is only able to read and store values, not modify them.

Now suppose you have two separate cores, either on separate sockets or sharing a single socket (with or without a shared cache). The first processor reads the value, and before it can write back the updated value, the second processor reads it. After both processors write the value back, it will have been incremented only once, not twice.

There is a way to avoid this problem; x86 processors (and most multi-core processors you will find) are able to detect this kind of conflict in hardware and sequence it, so that the whole read-modify-write sequence appears atomic. However, since this is very costly, it is only done when requested by the code, on x86 usually via the LOCK prefix. Other architectures can do this in other ways, with similar results; for instance, load-linked/store-conditional and atomic compare-and-swap (recent x86 processors also have this last one).

Note that using volatile does not help here; it only tells the compiler that the variable might have be modified externally and reads to that variable must not be cached in a register or optimized out. It does not make the compiler use atomic primitives.

The best way is to use atomic primitives (if your compiler or libraries have them), or do the increment directly in assembly (using the correct atomic instructions).

CesarB