views:

55

answers:

4

I have a bunch of threads that are doing lots of communication with each other. I would prefer this be lock free.

For each thread, I want to have a mailbox, where other threads can send it messages, (but only the owner can remove messages). This is a multiple-producer single-consumer situation. is it possible for me to do this in a lockfree / high performance matter? (This is in the inner loop of a gigantic simulation.)

Thanks!

A: 

may want to look at Intel thread building blocks, I recall being to lecture by Intel developer that mentioned something along those lines.

aaa
+1  A: 

Here's a paper from the University of Rochester illustrating a non-blocking concurrent queue. The algorithm described in the paper shows one technique for making a lockless queue.

Reed Copsey
Java's ConcurrentLinkedQueue implements the algorithm in the paper, FYI.
Will Hartung
I believe it's also, roughly, the basis for the net ConcurrentQueue<T> implementation in .NET: http://blogs.msdn.com/pfxteam/archive/2010/01/26/9953725.aspx
Reed Copsey
+1  A: 

Sure, if you have an atomic CompareAndSwap instruction:

for (i = 0; ; i = (i + 1) % MAILBOX_SIZE)
{
    if ((mailbox[i].owned == false) &&
        (CompareAndSwap(&mailbox[i].owned, true, false) == false))
        break;
}

mailbox[i].message = message;
mailbox[i].ready = true;

After reading a message, the consuming thread just sets mailbox[i].ready = false; mailbox[i].owned = false; (in that order).

caf
you would need acquire/release memory barriers on the ready = true/false and owned = false settings. As well as within the CompareAndSwap, but usually that is just part of the function.
tony
That depends entirely upon your architecture's memory ordering rules. If writes are not reordered with respect to other writes issuing from the same processor (which is common), then the memory barrier with `CompareAndSwap` itself is sufficient for correctness.
caf
A: 

Lock-free Multiple Producer Single Consumer (MPSC) Queue is one of the easiest lock-free algorithms to implement.

The most basic implementation requires a simple lock-free singly-linked list (SList) with only push() and flush(). The functions are available in the Windows API as InterlockedFlushSList() and InterlockedPushEntrySList() but these are very easy to roll on your own.

Multiple Producer push() items onto the SList using a CAS (interlocked compare-and-swap).

The Single Consumer does a flush() which swaps the heap of the SList with a NULL using an XCHG (interlocked exchange). The Consumer then has a list of items in the reverse-order.

To process the items in order, you must simply reverse the list returned from flush() before processing it. If you do not care about order, you can simply walk the list immediately to process it.

Two notes if you roll your own functions:

1) If you are on a system with weak memory ordering (i.e. PowerPC), you need to put a "release memory barrier" at the beginning of the push() function and an "aquire memory barrier" at the end of the flush() function.

2) You can make the functions considerably simplified and optimized because the ABA-issue with SLists occur during the pop() function. You can not have ABA-issues with a SList if you use only push() and flush(). This means you can implement it as a single pointer very similar to the non-lockfree code and there is no need for an ABA-prevention sequence counter.

Adisak