views:

38

answers:

1

I'm studying the lock-free (en-,de-)queue algorithms of Michael and Scott. The problem is I can't explain/understand (nor the paper does, apart from the comments in the code itself) a couple of lines.

Enqueue:

  enqueue(Q: pointer to queue_t, value: data type)
   E1:   node = new_node()        // Allocate a new node from the free list
   E2:   node->value = value      // Copy enqueued value into node
   E3:   node->next.ptr = NULL    // Set next pointer of node to NULL
   E4:   loop                     // Keep trying until Enqueue is done
   E5:      tail = Q->Tail        // Read Tail.ptr and Tail.count together
   E6:      next = tail.ptr->next // Read next ptr and count fields together
   E7:      if tail == Q->Tail    // Are tail and next consistent?
               // Was Tail pointing to the last node?
   E8:         if next.ptr == NULL
                  // Try to link node at the end of the linked list
   E9:            if CAS(&tail.ptr->next, next, <node, next.count+1>)
  E10:               break        // Enqueue is done.  Exit loop
  E11:            endif
  E12:         else               // Tail was not pointing to the last node
                  // Try to swing Tail to the next node
  E13:            CAS(&Q->Tail, tail, <next.ptr, tail.count+1>)
  E14:         endif
  E15:      endif
  E16:   endloop
         // Enqueue is done.  Try to swing Tail to the inserted node
  E17:   CAS(&Q->Tail, tail, <node, tail.count+1>)

Why is E7 needed? Does correctness depend on it? Or is it merely an optimization? This if can fail if another thread successfully executed E17, or D10 below, (and changed Q->Tail) while the first thread has executed E5 but not yet E7. But what if E17 is executed right after the first thread executes E7?

edit: Does this last sentence prove that E7 cannot be more than an optimization? My intuition is that it does, since I give a scenario were "apparently" the if statement would make the wrong decision, yet the algorithm would still be supposed to work correctly. But then we could replace the if's condition with a random condition, without affecting correctness. Any hole in this argument?

Dequeue:

dequeue(Q: pointer to queue_t, pvalue: pointer to data type): boolean
   D1:   loop                          // Keep trying until Dequeue is done
   D2:      head = Q->Head             // Read Head
   D3:      tail = Q->Tail             // Read Tail
   D4:      next = head.ptr->next      // Read Head.ptr->next
   D5:      if head == Q->Head         // Are head, tail, and next consistent?
   D6:         if head.ptr == tail.ptr // Is queue empty or Tail falling behind?
   D7:            if next.ptr == NULL  // Is queue empty?
   D8:               return FALSE      // Queue is empty, couldn't dequeue
   D9:            endif
                  // Tail is falling behind.  Try to advance it
  D10:            CAS(&Q->Tail, tail, <next.ptr, tail.count+1>)
  D11:         else                    // No need to deal with Tail
                  // Read value before CAS
                  // Otherwise, another dequeue might free the next node
  D12:            *pvalue = next.ptr->value
                  // Try to swing Head to the next node
  D13:            if CAS(&Q->Head, head, <next.ptr, head.count+1>)
  D14:               break             // Dequeue is done.  Exit loop
  D15:            endif
  D16:         endif
  D17:      endif
  D18:   endloop
  D19:   free(head.ptr)                // It is safe now to free the old node
  D20:   return TRUE                   // Queue was not empty, dequeue succeeded

Again, why D5 is needed? Correctness or optimization? I'm not sure what "consistency" these tests give, since it seems they can get inconsistent right after the if succeeds.

This looks like a standard technique. Can someone explain the motivation behind it? To me, it seems like the intention is to avoid doing an (expensive) CAS in those few cases it can be noticed that it would definitely fail, but at the cost of always doing an extra read, which is not supposed to be so much cheaper itself (e.g. in Java, Q->Tail would be required to be volatile, so we would know we are not merely reading a copy cached in a register but reading the real thing, which would be translated in prepending the read with a fence of some sort), so I'm not sure what's really going on here... thanks.

edit This has been ported to Java, more precisely in ConcurrentLinkedQueue, e.g. E7 is line 194, while D5 is line 212.

+1  A: 

Why is E7 needed?

Its more for optimization.

Consider two threads trying to enqueue at the same time. They all get to E5 but before thread 1 gets to E7 thread 2 successfully queues. When thread 1 gets to E7 it will observer t == tail to be false then retries. This will avoid a costly CAS. Of course its not full proof because E7 can succeed before thread 2 enqueues and eventually fails the CAS and has to retry anyway.

why D5 is needed

Similar to D5

Again, both functions without E7 and D5 would work. There was probably some benchmarking going on and found that under moderate contention the double check increases throughput (this is more of an observation and less of fact).

Edit:

I went and read the paper on this queue a bit more. The check is also there for correctness of a lock free algorithm and less of the data structure's state.

The lock-free algorithm is non-blocking because if there are non-delayed processes attempting to perform operations on the queue, an operation is guaranteed to complete within finite time. An enqueue operation loops only if the condition in line E7 fails, the condition in line E8 fails, or the compare and swap in line E9 fails. A dequeue operation loops only if the condition in line D5 fails, the condition in line D6 holds (and the queue is not empty), or the compare and swap in line D13 fails. We show that the algorithm is non-blocking by showing that a process loops beyond a finite number of times only if another process completes an operation on the queue.

http://www.cs.rochester.edu/u/scott/papers/1996_PODC_queues.pdf

John V.
The intention of this passage is enumerating all causes that might cause a thread to loop, and showing that in every single case of these, another thread succeeded in completing an operation. Thus, if E7 and D5 were absent, this argument would be even simpler, not wrong, so this still leaves us with the "it's just an optimization" story, with which I tend to agree...
Dimitris Andreou
Yes Dimitris I agree with you completely.
John V.
Unless we both prove wrong, I'll mark this answer as correct :)
Dimitris Andreou