views:

254

answers:

4

I'm running into a situation where I need the atomic sum of two values in memory. The code I inherited goes like this:

int a = *MemoryLocationOne;
memory_fence();
int b = *MemoryLocationTwo;
return (a + b) == 0;

The individual reads of a and b are atomic, and all writes elsewhere in the code to these two memory locations are also lockless atomic. However the problem is that the values of the two locations can and do change between the two reads.

So how do I make this operation atomic? I know all about CAS, but it tends to only involve making read-modify-write operations atomic and that's not quite what I want to do here.

Is there a way to do it, or is the best option to refactor the code so that I only need to check one value?

Edit: Thanks, I didn't mention that I wanted to do this locklessly in the first revision, but some people picked up on it after my second revision. I know no one believes people when they say things like this, but I can't use locks practically. I'd have to emulate a mutex with atomics and that'd be more work than refactoring the code to keep track of one value instead of two.

For now my method of investigation involves taking advantage of the fact that the values are consecutive and grabbing them atomically with a 64 bit read, which I'm assured are atomic on my target platforms. If anyone has new ideas, please contribute! Thanks.

+1  A: 

You would have to ensure that everywhere either of the two values were read or written, they were surrounded by a memory barrier (lock or critical section).

// all reads...
lock(lockProtectingAllAccessToMemoryOneAndTwo)
{
    a = *MemoryLocationOne;
    b = *MemoryLocationTwo;
}

...

// all writes...
lock(lockProtectingAllAccessToMemoryOneAndTwo)
{
    *MemoryLocationOne = someValue;
    *MemoryLocationTwo = someOtherValue;
}
Mitch Wheat
+1  A: 

If you truly need to ensure that a and b don't change while you are doing this test, then you need to use the same synchronization for all access to a and b. That's your only choice. Each read and each write to either of these values needs to use the same memory fence, synchronizer, semaphore, timeslice lock, or whatever mechanism is used.

With this, you can ensure that if you:

memory_fence_start();
int a = *MemoryLocationOne;
int b = *MemoryLocationTwo;
int test = (a + b) == 0;
memory_fence_stop();

return test;

then a will not change while you are reading b. But again, you have to use the same synchronization mechanism for all access to a and to b.

To reflect a later edit to your question that you are looking for a lock-free method, well, it depends entirely on the processor you are using and on how long a and b are and on whether or not these memory locations are consecutive and aligned properly.

Assuming these are consecutive in memory and 32 bits each and that your processor has an atomic 64-bit read, then you can issue an atomic 64-bit read to read the two values in, parse the two values out of the 64-bit value, do the math and return what you want to return. Assuming you never need an atomic update to "a and b at the same time" but only atomic updates to "a" or to "b" in isolation, then this will do what you want without locks.

Eddie
They happen to be consecutive 32 bit addresses and the processors I need to code to work on have atomic 64-bit reads if aligned correctly, so this looks like the way forward.
Dan Olson
Ah, in that case you can either write some assembly or just check the assembly output of your compiler to check that using a 64-bit type will do the right thing. In that case, you can go lock-free.
Eddie
+1  A: 

There really is no way to do this without a lock. No processors have a double atomic read, as far as I know.

Zifre
Even if a processor had a double atomic read, that would only help if the two memory addresses in question were properly aligned and consecutive.
Eddie
No, a processor could allow an atomic read of two arbitrary memory locations.
Zifre
I'm not aware of any processor that allows a guaranteed atomic read of non-consecutive arbitrary memory locations, even in a multi-processor environment, but I'll take your word for it.
Eddie
Why is this getting upvoted when it is clearly wrong?
Eddie
I'm not saying that there is a processor that allows this, just that it is theoretically possible.
Zifre
DCAS - atomic compare and swap of two seperate memory locations - was implemented in the Motorola 68030. It is not present in modern CPUs. CAS2 - atomic compare and swap of two contigious words of memory - is implemented in x86/amd64 and I think a number of other modern CPUs.
Blank Xavier
+2  A: 

If you are targeting x86, you can use the 64-bit compare/exchange support and pack both int's into a single 64-bit word.

On Windows, you would do this:

// Skipping ensuring padding.
union Data
{
     struct members
     {
         int a;
         int b;
     };

     LONGLONG _64bitData;  
};

Data* data;


Data captured;

do
{
    captured = *data;
    int result = captured.members.a + captured.members.b;
} while (InterlockedCompareExchange64((LONGLONG*)&data->_64bitData,
                    captured._64BitData,
                    captured._64bitData) != captured._64BitData);

Really ugly. I'd suggest using a lock - much more maintainable.

EDIT: To update and read the individual parts:

data->members.a = 0;
fence();

data->members.b = 0;
fence();

int captured = data->members.a;

int captured = data->members.b;
Michael
This is true, but you're screwed when you need to update only one of those elements.
Andrew Grant
No, you do underlying atomic read/writes to the 32-bit parts.
Michael
And in case it wasn't clear, my answer was to show how it could be done - I always recommend avoiding lock free algorithms.
Michael
@Michael: What's wrong with lock-free algorithms when they actually work? If you're working with a processor that provides atomicity guarantees, then it's perfectly safe.
Eddie
Especially on embedded systems where you may not have the luxury of the latest, fastest chips and optimizing compilers, the speed difference between a locking and a lock-free algorithm can be important.
Eddie
@Eddie: Lock-free is harder to understand and easier to get wrong. You may know all the in's and out's, but the guy who owns the code after you may not and can easily screw up.
Michael
@Michael: Yes, you are correct. On non-embedded systems, it is usually best to either tightly encapsulate lock-free-dependent code or to use locking algorithms for this reason. I am making the assumption that the OP is working on an embedded system.
Eddie
@Michael: I will have to examine your method more closely later. The read is mostly the concern, not the add, and you've done the read atomically without needing a CompareExchange by expanding to 64 bits. If the method can be adapted well it could be useful backup though, thanks.
Dan Olson
@Dan: The compiler will probably expand it to two 32-bit reads. The 64-bit CompareExchange validates that it hasn't changed during the capture, and will continue to loop if it has.
Michael
Nooo...the two-word functions translate directly into machine code. CAS2 is a specific x86/amd64 instruction.
Blank Xavier