is there any C++ implementation (source codes) of "optmistic approach to lock-free FIFO queues" algorithm?
+4
A:
Herb Sutter covered just such a queue as part of his Effective Concurency column in Dr. Dobbs Journal.
greyfade
2010-05-31 18:58:36
that is theory, what i'am asking is the implementation. is there any source codes or library that implement those algorithm?
uray
2010-05-31 19:34:09
@uray: Yes. In the article.
greyfade
2010-05-31 19:41:42
where? i don't see any source code files there.
uray
2010-05-31 19:50:10
Did you even read the article? *All of page 2* is annotated source code.
greyfade
2010-05-31 19:53:46
ok, sorry about that, i'am expecting it is wrapped as library or something... so I just include the source to my project and use it.is there any benchmark of this algorithm compared to the paper I refer above?
uray
2010-05-31 20:28:23
Compared *to*, no. But there are benchmarks in his follow-up articles, linked from his website: EC #16 through EC #18.
greyfade
2010-05-31 21:20:14
do you have those implementation of atomic<T> ?
uray
2010-05-31 22:17:16
No, that's something that's in C++0x. You can find libraries that have a similar type, or you can roll your own. Setting an `atomic<>` is a simple matter of a compare-and-swap operation, which is often provided as a compiler intrinsic.
greyfade
2010-06-01 00:09:03
This question has some discussion on writing `atomic<>` and similar functions: http://stackoverflow.com/questions/1158374/portable-compare-and-swap-atomic-operations-c-c-library
greyfade
2010-06-01 00:13:41
thx, I ended up writing my own atomic<>, I have one more question about CACHE_LINE_SIZE on my new post below..
uray
2010-06-01 00:27:09
A:
If you're looking for a good lock free queue implementation both Microsoft Visual Studio 2010 & Intel's Thread Building Blocks contain a good LF queue which is similar to the paper.
Rick
2010-05-31 22:46:03
+1
A:
I want to conclude the answer given by greyfade, which is based on http://www.drdobbs.com/high-performance-computing/212201163 (the last part of the article), the optimized code would be (with some modification to suit my naming and coding convention) : `
template <typename T> class LFQueue {
private:
struct LFQNode {
LFQNode( T* val ) : value(val), next(nullptr) { }
T* value;
AtomicPtr<LFQNode> next;
char pad[CACHE_LINE_SIZE - sizeof(T*) - sizeof(AtomicPtr<LFQNode>)];
};
char pad0[CACHE_LINE_SIZE];
LFQNode* first; // for one consumer at a time
char pad1[CACHE_LINE_SIZE - sizeof(LFQNode*)];
InterlockedFlag consumerLock; // shared among consumers
char pad2[CACHE_LINE_SIZE - sizeof(InterlockedFlag)];
LFQNode* last; // for one producer at a time
char pad3[CACHE_LINE_SIZE - sizeof(LFQNode*)];
InterlockedFlag producerLock; // shared among producers
char pad4[CACHE_LINE_SIZE - sizeof(InterlockedFlag)];
public:
LFQueue() {
first = last = new LFQNode( nullptr ); // no more divider
producerLock = consumerLock = false;
}
~LFQueue() {
while( first != nullptr ) {
LFQNode* tmp = first;
first = tmp->next;
delete tmp;
}
}
bool pop( T& result ) {
while( consumerLock.set(true) )
{ } // acquire exclusivity
if( first->next != nullptr ) { // if queue is nonempty
LFQNode* oldFirst = first;
first = first->next;
T* value = first->value; // take it out
first->value = nullptr; // of the Node
consumerLock = false; // release exclusivity
result = *value; // now copy it back
delete value; // and clean up
delete oldFirst; // both allocations
return true; // and report success
}
consumerLock = false; // release exclusivity
return false; // queue was empty
}
bool push( const T& t ) {
LFQNode* tmp = new LFQNode( t ); // do work off to the side
while( producerLock.set(true) )
{ } // acquire exclusivity
last->next = tmp; // A: publish the new item
last = tmp; // B: not "last->next"
producerLock = false; // release exclusivity
return true;
}
};
`
another question is how do you define CACHE_LINE_SIZE? its vary on ever CPUs right?
uray
2010-06-01 00:13:00
A good number to choose would be `64` bytes, I think. But you'll probably want to balance it with size, so I'd suggest looking at your target CPUs and choose a size that fits the most common ones you expect to target.
greyfade
2010-06-01 01:54:42
Just a quick note: this is not a forum, so people can't be assume to "browse the thread". If you wish to ask another question, you should better use the "Ask Question" field rather than the "Your Answer" one.
Matthieu M.
2010-06-01 06:26:32
I'am indeed re-answering the question, but i was wrong asking in the answer field, I should add new comment under my own new answer. sorry about that.
uray
2010-06-01 11:51:10
I'am done benchmarking the above code against std::queue with CRITICAL_SECTION lock in windows, the lock-free queue is actually 2~3 times _slower_ than implementation of std::queue with lock. do you know why? it is because of linked list?
uray
2010-06-01 11:53:53
sharing benchmark code including what system you're running would be useful here. Also what is the intended usage in your app, that's what matters.
Rick
2010-06-01 20:40:20