ansaurus

Question

Can a C# blocking FIFO queue leak messages? What's wrong in my code?

Answer 1

+4 A:

I must say, this struck me as a very clever idea, and I thought about it for a while before I started to realize where (I think) the bug is here. So, on one hand, kudos on coming up with such a clever design! But, at the same time, shame on you for demonstrating "Kernighan's Law":

Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it.

The issue is basically this: you are assuming that the WaitOne and Release calls effectively serialize all of your Enqueue and Dequeue operations; but that isn't quite what is going on here. Remember that the Semaphore class is used to restrict the number of threads accessing a resource, not to ensure a particular order of events. What happens between each WaitOne and Release is not guaranteed to occur in the same "thread-order" as the WaitOne and Release calls themselves.

This is tricky to explain in words, so let me try to provide a visual illustration.

Let's say your queue has a capacity of 8 and looks like this (let 0 represent null and x represent an object):

[ x x x x x x x x ]

So Enqueue has been called 8 times and the queue is full. Therefore your _writeSema semaphore will block on WaitOne, and your _readSema semaphore will return immediately on WaitOne.

Now let's suppose Dequeue is called more or less concurrently on 3 different threads. Let's call these T1, T2, and T3.

Before proceeding let me apply some labels to your Dequeue implementation, for reference:

public T Dequeue()
{
    _readSema.WaitOne();                                   // A
    int index = Interlocked.Increment(ref _tail);          // B
    index %= _capacity;
    if (index < 0) index += _capacity;
    T ret = Interlocked.Exchange(ref _array[index], null); // C
    Interlocked.Decrement(ref _count);
    _writeSema.Release();                                  // D

    return ret;
}

OK, so T1, T2, and T3 have all gotten past point A. Then for simplicity let's suppose they each reach line B "in order", so that T1 has an index of 0, T2 has an index of 1, and T3 has an index of 2.

So far so good. But here's the gotcha: there is no guarantee that from here, T1, T2, and T3 are going to get to line D in any specified order. Suppose T3 actually gets ahead of T1 and T2, moving past line C (and thus setting _array[2] to null) and all the way to line D.

After this point, _writeSema will be signaled, meaning you have one slot available in your queue to write to, right? But your queue now looks like this!

[ x x 0 x x x x x ]

So if another thread has come along in the meantime with a call to Enqueue, it will actually get past _writeSema.WaitOne, increment _head, and get an index of 0, even though slot 0 is not empty. The result of this will be that the item in slot 0 could actually be overwritten, before T1 (remember him?) reads it.

To understand where your null values are coming from, you need only to visualize the reverse of the process I just described. That is, suppose your queue looks like this:

[ 0 0 0 0 0 0 0 0 ]

Three threads, T1, T2, and T3, all call Enqueue nearly simultaneously. T3 increments _head last but inserts its item (at _array[2]) and calls _readSema.Release first, resulting in a signaled _readSema but a queue looking like:

[ 0 0 x 0 0 0 0 0 ]

So if another thread has come along in the meantime with a call to Dequeue (before T1 and T2 are finished doing their thing), it will get past _readSema.WaitOne, increment _tail, and get an index of 0, even though slot 0 is empty.

So there's your problem. As for a solution, I don't have any suggestions at the moment. Give me some time to think it over... (I'm posting this answer now because it's fresh in my mind and I feel it might help you.)

Dan Tao 2010-10-10 02:42:23

so with this design, no readers can be allowed to proceed while any writers are in 'enqueue' and no writers can be a allowed to proceed while any readers are in 'dequeue'.

Les 2010-10-10 03:07:39

Thano you Dan!!! Kudos and kudos to you for having made me realize the terrible design flaw in my code! Now I'm thinking about it... the important thing is that semaphores must be released in the order threads increment the pointers, but there is no costraint on write order. However, I must remember that T[] (when T is a class) holds a **reference** for each cell, ie. a pointer. On all the platforms, copying a pointer is fast as an atomic operation. So I believe locking the whole method once rather than using complex mutex schemas will perform better. If you have more ideas, here I am ;)

djechelon 2010-10-10 11:59:52

Answer 2

+2 A:

(+1 to Dan Tao who I vote has the answer) The enqueue would be changed to something like this...

while (Interlocked.CompareExchange(ref _array[index], item, null) != null)
    ;

The dequeue would be changed to something like this...

while( (ret = Interlocked.Exchange(ref _array[index], null)) == null)
    ;

This builds upon Dan Tao's excellent analysis. Because the indexes are atomically obtained, then (assuming the no threads die or terminate in the enqueue or dequeue methods) a reader is guaranteed to eventually have his cell filled in, or the writer is guaranteed to eventually have his cell freed (null).

Les 2010-10-10 03:40:14

Answer 3

A:

Thank you Dan Tao and Les,

I really appreciated your help a lot. Dan, you opened my mind: it's not important how many producers/consumers are inside the critical section, the important is that the locks are released in order. Les, you found the solution to the problem.

Now it's time to finally answer my own question with the final code I made thanks to the help of both of you. Well, it's not much but it's a little enhancement from Les's code

Enqueue:

while (Interlocked.CompareExchange(ref _array[index], item, null) != null)
            Thread.Sleep(0);

Dequeue:

while ((ret = Interlocked.Exchange(ref _array[index], null)) == null)
            Thread.Sleep(0);

Why Thread.Sleep(0)? As you know, atomic operations are very fast because of their optimization. When we have such a Queue where T is a reference, references are just pointers that can be atomically read/stored with up to a single CPU instructions. I don't have sophisticated tools to find by myself, but I believe Les's code may actually lead to a thread trying to monopolize CPU in the while loop until system timer says it's time for context switch. So what? When I discover I can't write a cell because it's not empty, force context switch with a code snippet found on http://progfeatures.blogspot.com/2009/05/how-to-force-thread-to-perform-context.html

I also tested the code of the previous test case to get proof of my claims:

without sleep(0)

Read 6164150 elements
Wrote 6322541 elements
Read 5885192 elements
Wrote 5785144 elements
Wrote 6439924 elements
Read 6497471 elements

with sleep(0)

Wrote 7135907 elements
Read 6361996 elements
Wrote 6761158 elements
Read 6203202 elements
Wrote 5257581 elements
Read 6587568 elements

I know this is not a "great" discover and I will wiln no Turing prize for these numbers. Performance increment is not substantial. Forcing context switch allows more RW operations to be performed (*to be clear: in my test, I evaluate the performance of the queue, not simulate a producer/consumer problem, so don't care if at the end of the test after a minute there are still elements in queue). But I just demonstrated my approach works, thanks to you all.

Code available open source as MS-RL: http://logbus-ng.svn.sourceforge.net/viewvc/logbus-ng/trunk/logbus-core/It.Unina.Dis.Logbus/Utils/FastFifoQueue.cs?revision=461&view=markup

djechelon 2010-10-12 22:28:23

ansaurus

tags:

views:

answers:

Can a C# blocking FIFO queue leak messages? What's wrong in my code?

related questions