views:

515

answers:

10

Hi all..

I've been thinking, just how deep into everything do you have to go before something is automatically thread-safe?

Quick example:

int dat = 0;
void SetInt(int data)
{
    dat = data;
}

.. Would this method be considered threadsafe? I ussually wrap all my set-methods in mutex'es, just to be sure, but everytime I do so I can't help but to think that it's a useless perfomance overhead. I guess it all breaks down to the assembly the compiler generates? When are threads able to break into code? Per assembly-instruction or per code-line? Can a thread break in during the set-up or destruction of a method-stack? Would an instruction like i++ be considered threadsafe - and if not, what about ++i?

Lotsa questions here - and I dont expect a direct answer, but some info on the subject would be great :)

[UPDATE] Since it's clear for me now (thx to you guys <3 ), that the only atomic-guarenteed stuff in threading is an assembly instruction, I know came to think: What about mutex- and semaphore-wrapperclasses? Classes like this ussually use methods which makes callstacks - and custom semaphoreclasses that ussually utilizes some kind of internal counter can not be guarenteed to be atomic / threadsafe (whatever you wanna call it, as long as you know what I mean, I dont care :P )

+3  A: 

In general, a thread context switch can happen at any time, between any two assembly language instructions. The CPU is completely unaware of how the assembly language maps to your source code. Furthermore, with multiple processors, other instructions can be executing on a different CPU core at the very same time.

Having said that, in the example you gave the assigment of a CPU-sized word to a memory location is generally an atomic operation. This means that from the point of view of an observer (another thread), the assignment has either not started yet, or has been completed. There is no in-between state.

There are many subtleties in multiprocessing, so it's good to be aware of the possibilities for the hardware and OS environment in which you're working.

Greg Hewgill
Yes, I thought so.. Would a non-cpu-sized datatype like a double be threadsafe in an assignment-call then? And with an int-assignment being an atomic operation, don't you have the good ol' transaction-sync problem? Oh my, and portability issuesOh well, guess it's back to looking at disassemblies..
Meeh
Greg, you forgot that the member (presuming this setter is in a class) has to be dereferenced. As far as I know (and I know little), the resulting operation is *not* always atomic.
Konrad Rudolph
Not only the operation is not atomic (and the word size is not an argument), but it's not fenced. Either you use specific keywords to guarantee atomicity or you use mutexes and the like.
Edouard A.
> "between any two assembly language instructions"Some assembly instructions (like add) must read, modify, then write back to that memory. For multi-core CPUS, even a single assembly instruction can be unsafe. If you can't use mutexes, then you need to use 'cmpxchg' or similar compiler intrinsic.
Kevin
+1  A: 

The only way to make sure that something is automatically threadsafe is to make sure that there is not mutable shared state. This is why functional programming is gaining traction these days.

So, if all your threads share X, then you must make sure that X does not change. Any variables that do change must be local to that thread.

Charles Graham
+2  A: 

Thread state can change between any two machine instructions. If the computer is able to perform the assignment in a single machine instruction, then the assignment should be thread-safe on a single processor machine. In general, it is not safe to assume that the results of the computation on the right hand side of an assignment can be computed and stored in the location specified by the left hand side of the assignment in a single instruction. On some processor there may be no memory to memory copy instruction available and the data may need to be loaded into a register first. If the context switch happens between the load and store instructions, then the outcome of the assignment is indeterminate (not thread-safe). This is one reason why most instruction sets contain an atomic test-and-set operation that allows you to use a memory location as a lock. This allows other threads to check the lock availability and wait to proceed until the lock has been obtained.

In your case, I'm not sure that it matters whether the operation completes in a thread-safe manner at the hardware level since the result of multiple, competing threads performing the assignment would be simply to have one of them complete the store last and "win". If you were performing any sort of calculation on the right hand side, though, that involved computation that used more than one variable, then I would definitely put it in a critical section since you would want the results of the computation consistent with the state of those variables when the computation starts. If not in a critical section, the variables could have their values change mid-stream by another thread and you could end up with result that would not be possible from any one thread.

tvanfosson
+1  A: 

This is not thread-safe and it does not end up good for all kind of situations.

Suppose dat variable holds the count of elements in an array. The other thread begins to scan the array by using the dat variable and its value is cached. In the meantime, you change the value of dat variable. The other thread scans the array again for some other operation. Does the other thread use the old dat value or the new one ? We do not know and we can not be sure. Depending on compilation of the module it may use the old cached value or the new value, either case is trouble-some.

You may explicitly cache the value of dat variable on the other thread for more predictable results. For example if this dat variable holds a timeout value and you only write to this value and the other thread reads, then i do not see a problem here. Even if this is the case, you can not say this is thread-safe !!!

Malkocoglu
The code is thread safe by the normal definition. The above situation would arise only when the calling program is not thread safe.
James Anderson
A: 

The above code is threadsafe!

The main thing to look out for is static (i.e.shared) variables.

These are not thread safe unless update is managed by some sort of locking machanism such as a mutex. The same obviously applies to any OS provided shared memory.

So as long as your code has no static data it will be thread safe in itself.

You then need to check whether any libraries or system calls you use are thread safe. This is stated explicitly in the documentation of most system calls.

James Anderson
Arrgggh! Must be going "code blind" this is not thread safe at all!Sorry.
James Anderson
A: 

Increment operation isn't thrad safe on x86 processors because it is not atomic. On windows you need to call InterlockedIncrement functions. This function generate full memory barier. Also you can use tbb::atomic from intel threading building blocks(TBB) library.

Lazin
+1  A: 

Assignment of "native" datatypes (32bit) is atomic on most platforms (including x86). That means the assignment will happen completely, and you don't risk having a "halfway updated" dat variable. But that is the only guarantee you get.

I'm not sure about an assignemnt of a double datatype. You could look it up in the x86 specs, or check if .NET makes any explicit guarantees. But in general, datatypes that aren't "native size" will not be atomic. Even smaller ones, like bool may not be (because to write a bool, you may have to read an entire 32-bit word, overwrite one byte, then write the entire 32-byte word again)

In general, threads can be interrupted between any two assembly instructions. That means your code above is thread safe as long as you don't try to read from dat (which, you might argue, makes it fairly useless).

Atomicity and thread safety are not quite the same thing. Thread safety depends entirely on the context. Your assignment to dat is atomic, so another thread reading the dat value will either see the old or the new value, but never an "in between" one. But that doesn't make it thread safe. Another thread might read the old value (say it's the size of an array), and perform an operation based on that. But you might update dat immediately after it read the old value, perhaps setting it to a smaller value. The other thread might now access your new, smaller array, but believe it to have the old, larger size.

i++ and ++i are not thread safe either, because they consist of multiple operations (read value, increment value, write value), and in general, anything that consists of both reads and writes is not thread safe. Threads can also be interrupted while setting up the call stack for a function call, yes. After any assembler instruction.

jalf
Great answer! Thx a lot :)
Meeh
+1  A: 

Well, I don't believe everything has to be thread safe. Since there's a cost in both complexity and performance for making code thread safe, you should ask yourself if you code needs to be thread safe before you implement anything. In many cases you can restrict thread awareness to specific parts of your code.

Obviously that requires some thinking and planning, but so does writing thread safe code.

Brian Rasmussen
+2  A: 

considerations:

1) compiler optimization - does "dat" even exist as you planned? Unless it is an "externally observable" behavior, C/C++ abstract machine does not guarantee the compiler won't optimize it out. There might be no "dat" at all in your binary code, but instead you may be writing to a register, and threads will/may have different registers. Read C/C++ standard on the abstract machine or simply google for "volatile" and explore from there. C/C++ standard cares about single thread sanity, multiple threads can stumble over such optimization easily.

2) atomic stores. Anything that has a chance of crossing word boundaries will not be atomic. Int-s usually are, unless you pack them into a structure that has, for example, chars, and use directives to remove padding. But you need to analyze this aspect every time. Research your platform, google for "padding". Keep in mind that different CPUs have different rules.

3) multi-CPU issues. You wrote to "dat" on CPU0. Will the change be even seen on CPU1? Or will you just write to a local register? To cache? Are caches kept coherent you your platform? Is access guaranteed to be kept in order? Read on "weak memory model". Gogle for "memory_barriers.txt Linux" - it's a good start.

4) the use case. You intend to use "dat' after assignment - is that synchronized? But this is I guess obvious.

Usually "thread safety" does not go beyond guaranteeing that a function will work if called from different threads at the same time, but those calls must not be inter-dependent, i.e., they don't exchange any data with regard to that call. For example, you call malloc() from thread1 and thread2 and they both get memory, but they don't access each other's memory.

A counter-example would be strtok() which is not thread safe and would break even on unrelated calls.

As soon as your threads start to talk to each other over data, the usual thread safety doesn't guarantee much.

n-alexander
+1 for covering incoherent caches. The folks talking about threads being interrupted "between assembler instructions" completely miss this point, that just because your operation has executed fully doesn't mean other threads will see the result now, soon, or ever...
Steve Jessop
A: 

There is a lot of research going into transactional memory.
Something similar to DB transactions but on a much finer grain.

Theoretically this allows multiple threads to read/write do anything they like with an object. But all operations on an object are transactional aware. If a thread modifies an object state (and completes its transaction) all other threads that have open transactions on the object will be rolled back and re-started automatically.

This is done at the hardware level so software does not need to get involved in the problems associated with locking.

Nice theory. Cant wait for it to become reality.

Martin York
Sounds very sexy indeed! But prolly waaay into the future :(
Meeh
I've read a paper on a software implementation of transactional memory, I think in Haskell, which allowed the compiler to enforce a certain amount of safety (e.g. a transactional function can't call an irreversible one). Obviously hardware help could make that kind of thing more performant.
Steve Jessop