ansaurus

Question

Memory barriers vs. interlocked operations

Answer 1

A:

Some barriers (such as the powerpc isync, and a .acq load on ia64) also have an effect on ubsequent loads. ie: if a load was available before the isync due to prefetching it must be discarded. When used appropriately perhaps that's enough to make Dekker's algorithm work on a weak memory model.

You've also got cache invalidation logic working for you. If you know that your load is current due to something like an isync and that the cached version of the data is invalidated if another cpu touches it, is that enough?

Interesting questions aside, Dekker's algorithm is for all practical purposes dumb. You are going to want to use atomic hardware interfaces and memory barriers for any real application, so focusing on how to fix up Dekker's with atomics just doesn't seem worthwhile to me;)

Peeter Joot 2010-07-23 17:07:09

<data is invalidated if another cpu touches it, is that enough?> My question concerns weak models that do not make these kinds of guarantees. <Dekker's algorithm is for all practical purposes dumb.> Yes. What I'm trying to get at is whether or not you can turn any (or most) algorithms that work on a strong model into ones that work on a weak model by putting in "enough" memory barriers.

binarycoder 2010-07-23 23:32:11

Answer 2

A:

Say you put in a load and store barrier after every statement, and in addition you ensured that the compiler didn't reorder your stores. Wouldn't this, on any reasonable architecture, provide strict consistency? Dekker's works on sequentially consistent architectures. Sequential consistency is a weaker condition than strict consistency.

http://www.cs.nmsu.edu/~pfeiffer/classes/573/notes/consistency.html

Even on a CPU that has a weak consistency model, you'd still expect cache coherence. I think that where things get derailed is the behavior of store buffers and speculated reads, and what operations are available flush stored writes and invalidate speculated reads. If there isn't a load fence that can invalidate speculated reads, or there isn't a write fence that flushes a store buffer, in addition to not being able to implement Dekker's, you won't be able to implement a mutex!

So here's my claim. If you have a write barrier available, and a read barrier, and the cache is coherent between CPUs, then you can trivially make all code sequentially consistent by flushing writes (store fence) after every instruction, and flushing speculations (read fence) before every instruction. So I claim that you don't need atomics for what you're talking about, and that you can do what you need with Dekker's only. Sure you wouldn't want to.

BTW, Corensic, the company I work for, writes cool tools for debugging concurrency issues. Check out http://www.corensic.com.

pgod 2010-07-24 20:00:55

Answer 3

+2 A:

You are right that a memory barrier cannot ensure that a read sees up-to-date values. What it does do is enforce an ordering between operations, both on a single thread, and between threads.

For example, if thread A does a series of stores and then executes a release barrier before a final store to a flag location, and thread B reads from the flag location and then executes an acquire barrier before reading the other values then the other variables will have the values stored by thread A:

// initially x=y=z=flag=0

// thread A
x=1;
y=2;
z=3;
release_barrier();
flag=1;

// thread B
while(flag==0) ; // loop until flag is 1
acquire_barrier();
assert(x==1);  // asserts will not fire
assert(y==2);
assert(z==3);

Of course, you need to ensure that the load and store to flag is atomic (which simple load and store instructions are on most common CPUs, provided the variables are suitably aligned). Without the while loop on thread B, thread B may well read a stale value (0) for flag, and thus you cannot guarantee any of the values read for the other variables.

Fences can thus be used to enforce synchronization in Dekker's algorithm.

Here's an example implementation in C++ (using C++0x atomic variables):

std::atomic<bool> flag0(false),flag1(false);
std::atomic<int> turn(0);

void p0()
{
    flag0.store(true,std::memory_order_relaxed);
    std::atomic_thread_fence(std::memory_order_seq_cst);

    while (flag1.load(std::memory_order_relaxed))
    {
        if (turn.load(std::memory_order_relaxed) != 0)
        {
            flag0.store(false,std::memory_order_relaxed);
            while (turn.load(std::memory_order_relaxed) != 0)
            {
            }
            flag0.store(true,std::memory_order_relaxed);
            std::atomic_thread_fence(std::memory_order_seq_cst);
        }
    }
    std::atomic_thread_fence(std::memory_order_acquire);

    // critical section


    turn.store(1,std::memory_order_relaxed);
    std::atomic_thread_fence(std::memory_order_release);
    flag0.store(false,std::memory_order_relaxed);
}

void p1()
{
    flag1.store(true,std::memory_order_relaxed);
    std::atomic_thread_fence(std::memory_order_seq_cst);

    while (flag0.load(std::memory_order_relaxed))
    {
        if (turn.load(std::memory_order_relaxed) != 1)
        {
            flag1.store(false,std::memory_order_relaxed);
            while (turn.load(std::memory_order_relaxed) != 1)
            {
            }
            flag1.store(true,std::memory_order_relaxed);
            std::atomic_thread_fence(std::memory_order_seq_cst);
        }
    }
    std::atomic_thread_fence(std::memory_order_acquire);

    // critical section


    turn.store(0,std::memory_order_relaxed);
    std::atomic_thread_fence(std::memory_order_release);
    flag1.store(false,std::memory_order_relaxed);
}

For a full analysis see my blog entry at http://www.justsoftwaresolutions.co.uk/threading/implementing_dekkers_algorithm_with_fences.html

Anthony Williams 2010-07-26 21:24:55

AFAICT, for Dekker's, it is not sufficient to know that the flag was clear some time in the past but rather that it is safe to enter the critical section right now. Sounds like I need the up-to-date value, and I don't see how you get that with memory barriers (as you say right in your first sentence).

binarycoder 2010-07-27 00:51:14

You just need a stronger barrier than that I just showed --- a "full fence". I'll update my answer to show Dekker's with barriers later.

Anthony Williams 2010-07-27 06:50:53

ansaurus

tags:

views:

answers:

Memory barriers vs. interlocked operations

related questions