views:

816

answers:

6

Assume:

A. C++ under WIN32.

B. A properly aligned volatile integer incremented and decremented using InterlockedIncrement() and InterlockedDecrement().

__declspec (align(8)) volatile LONG _ServerState = 0;

If I want to simply read _ServerState, do I need to read the variable via an InterlockedXXX function?

For instance, I have seen code such as:

LONG x = InterlockedExchange(&_ServerState, _ServerState);

and

LONG x = InterlockedCompareExchange(&_ServerState, _ServerState, _ServerState);

The goal is to simply read the current value of _ServerState.

Can't I simply say:

if (_ServerState == some value)
{
// blah blah blah
}

There seems to be some confusion WRT this subject. I understand register-sized reads are atomic in Windows, so I would assume the InterlockedXXX function is unnecessary.

Matt J.


Okay, thanks for the responses. BTW, this is Visual C++ 2005 and 2008.

If it's true I should use an InterlockedXXX function to read the value of _ServerState, even if just for the sake of clarity, what's the best way to go about that?

LONG x = InterlockedExchange(&_ServerState, _ServerState);

This has the side effect of modifying the value, when all I really want to do is read it. Not only that, but there is a possibility that I could reset the flag to the wrong value if there is a context switch as the value of _ServerState is pushed on the stack in preparation of calling InterlockedExchange().

LONG x = InterlockedCompareExchange(&_ServerState, _ServerState, _ServerState);

I took this from an example I saw on MSDN.
See http://msdn.microsoft.com/en-us/library/ms686355(VS.85).aspx

All I need is something along the lines:

lock mov eax, [_ServerState]

In any case, the point, which I thought was clear, is to provide thread-safe access to a flag without incurring the overhead of a critical section. I have seen LONGs used this way via the InterlockedXXX() family of functions, hence my question.

Okay, we are thinking a good solution to this problem of reading the current value is:

LONG Cur = InterlockedCompareExchange(&_ServerState, 0, 0);
A: 

you should be okay. It's volatile, so the optimizer shouldn't savage you, and it's a 32-bit value so it should be at least approximately atomic. The one possible surprise is if the instruction pipeline can get around that.

On the other hand, what's the additional cost of using the guarded routines?

Charlie Martin
+1  A: 

Current value reading may not need any lock.

Alphaneo
As long as it's no wider than the width of the word, and the variable _is_ marked as volatile, it won't. If you use a non-volatile variable and write to it via Interlocked, you _can't_ just read it, since you may see the cached value.
romkyns
A: 

The Interlocked* functions prevent two different processors from accessing the same piece of memory. In a single processor system you are going to be ok. If you have a dual-core system where you have threads on different cores both accessing this value, you might have problems doing what you think is atomic without the Interlocked*.

Doug T.
+1  A: 

Interlocked instructions provide atomicity and inter-processor synchronization. Both writes and reads must be synchronized, so yes, you should be using interlocked instructions to read a value that is shared between threads and not protected by a lock. Lock-free programming (and that's what you're doing) is a very tricky area, so you might consider using locks instead. Unless this is really one of your program's bottlenecks that must be optimized?

Bartosz Milewski
+4  A: 

It depends on what you mean by "goal is to simply read the current value of _ServerState" and it depends on what set of tools and the platform you use (you specify Win32 and C++, but not which C++ compiler, and that may matter).

If you simply want to read the value such that the value is uncorrupted (ie., if some other processor is changing the value from 0x12345678 to 0x87654321 your read will get one of those 2 values and not 0x12344321) then simply reading will be OK as long as the variable is :

  • marked volatile,
  • properly aligned, and
  • read using a single instruction with a word size that the processor handles atomically

None of this is promised by the C/C++ standard, but Windows and MSVC do make these guarantees, and I think that most compilers that target Win32 do as well.

However, if you want your read to be synchronized with behavior of the other thread, there's some additional complexity. Say that you have a simple 'mailbox' protocol:

struct mailbox_struct {
    uint32_t flag;
    uint32_t data;
};
typedef struct mailbox_struct volatile mailbox;


// the global - initialized before wither thread starts

mailbox mbox = { 0, 0 };

//***************************
// Thread A

while (mbox.flag == 0) { 
    /* spin... */ 
}

uint32_t data = mbox.data;

//***************************

//***************************
// Thread B

mbox.data = some_very_important_value;
mbox.flag = 1;

//***************************

The thinking is Thread A will spin waiting for mbox.flag to indicate mbox.data has a valid piece of information. Thread B will write some data into mailbox.data then will set mbox.flag to 1 as a signal that mbox.data is valid.

In this case a simple read in Thread A of mbox.flag might get the value 1 even though a subsequent read of mbox.data in Thread A does not get the value written by Thread B.

This is because even though the compiler will not reorder the Thread B writes to mbox.data and mbox.flag, the processor and/or cache might. C/C++ guarantees that the compiler will generate code such that Thread B will write to mbox.data before it writes to mbox.flag, but the processor and cache might have a different idea - special handling called 'memory barriers' or 'acquire and release semantics' must be used to ensure ordering below the level of the thread's stream of instructions.

I'm not sure if compilers other than MSVC make any claims about ordering below the instruction level. However MS does guarantee that for MSVC volatile is enough - MS specifies that volatile writes have release semantics and volatile reads have acquire semantics - though I'm not sure at which version of MSVC this applies - see http://msdn.microsoft.com/en-us/library/12a04hfd.aspx?ppud=4.

I have also seen code like you describe that uses Interlocked APIs to perform simple reads and writes to shared locations. My take on the matter is to use the Interlocked APIs. Lock free inter-thread communication is full of very difficult to understand and subtle pitfalls, and trying to take a shortcut on a critical bit of code that may end up with a very difficult to diagnose bug doesn't seem like a good idea to me. Also, using an Interlocked API screams to anyone maintaining the code, "this is data access that needs to be shared or synchronized with something else - tread carefully!".

Also when using the Interlocked API you're taking the specifics of the hardware and the compiler out of the picture - the platform makes sure all of that stuff is dealt with properly - no more wondering...

Read Herb Sutter's Effective Concurrency articles on DDJ (which happen to be down at the moment, for me at least) for good information on this topic.

Michael Burr
Interlocked APIs also scream "thread carefully." :)
mskfisher
A: 

32-bit read operations are already atomic on 32-bit systems, 64-bit read operations are not atomic. But you shouldn't use this for threads synchronization.

If you need a flag some sort you should consider using Event object and WaitForSingleObject function for that purpose.

Kirill V. Lyadvinsky