views:

3067

answers:

7

How do you implement an efficient and thread safe reference counting system on X86 CPUs in the C++ programming language?

I always run into the problem that the critical operations not atomic, and the available X86 Interlock operations are not sufficient for implementing the ref counting system.

The following article covers this topic, but requires special CPU instructions:

http://www.ddj.com/architect/184401888

+9  A: 

Nowadays, you can use the Boost/TR1 shared_ptr<> smart pointer to keep your reference counted references.

Works great; no fuss, no muss. The shared_ptr<> class takes care of all the locking needed on the refcount.

Michael Burr
Also note that Technical Report 1 (TR1) is a draft paper of items for inclusion in the C++0x standard. As C++0x comes closer, these will be even better supported as well (they are based on the Boost library).
Kris Kumler
I have my doubts about the lock-free implementation but this is likely still the best choice. Multi-Thread is fiendishly hard to do in C++ with all the tricks the compiler may do to your code.
James Dean
Man... Either I'm missing something, or something is wrong with this answer... Why did it get accepted and up-voted so many times, if it doesn't even answer the original question - it just suggests something else that already has a working implementation of the same thing that was in OQ?
Paulius Maruška
@Paulius - welcome to StackOverflow ;)
Thomas Hansen
@Paulinus - probably the person asking the question (and others) are simply looking for a working implementation that can be readily used.
Michael Burr
+3  A: 

Strictly speaking, you'll need to wait until C++0x to be able to write thread-safe code in pure C++.

For now, you can use Posix, or create your own platform independent wrappers around compare and swap and/or interlocked increment/decrement.

James Hopkin
Or do like everyone else and use Boost::TR1
gnud
+4  A: 

In VC++, you can use _InterlockedCompareExchange.

do
   read the count
   perform mathematical operation
   interlockedcompareexchange( destination, updated count, old count)
until the interlockedcompareexchange returns the success code.

On other platforms/compilers, use the appropriate intrinsic for the LOCK CMPXCHG instruction that MS's _InterlockedCompareExchange exposes.

moonshadow
Or simply InterlockedIncrement/Decrement.
yrp
A: 

If the instruction itself is not atomic then you need to make the section of code that updates the appropriate variable a critical section.

i.e. You need to prevent other threads entering that section of code by using some locking scheme. Of course the locks need to be atomic, but you can find an atomic locking mechanism within the pthread_mutex class.

The question of efficient: The pthread library is as efficient as it can be and still guarantee that mutex lock is atomic for your OS.

Is it expensive: Probably. But for everything that requires a guarantee there is a cost.

Martin York
A: 

That particular code posted in that ddj article is adding extra complexity to account for bugs in using smart pointers.

Specifically, if you can't guarantee that the smart pointer won't change in an assignment to another smart pointer, you are doing it wrong or are doing something very unreliable to begin with. If the smart pointer can change while being assigned to another smart pointer, that means that the code doing the assignment doesn't own the smart pointer, which is suspect to begin with.

MSN

MSN
+1  A: 

Win32 InterlockedIncrementAcquire and InterlockedDecrementRelease (if you want to be safe and care about platforms with possible reordering, hence you need to issue memory barriers at the same time) or InterlockedIncrement and InterlockedDecrement (if you are sure you will stay x86), are atomic and will do the job.

That said, Boost/TR1 shared_ptr<> will handle all of this for you, therefore unless you need to implement it on your own, you will probably do the best to stick to it.

Suma
A: 

Bear in mind that the locking is very expensive, and it happens every time you hand objects around between smart pointers - even when the object is currently owned by one thread (the smart pointer library doesn't know that).

Given this, there may be a rule of thumb applicable here (I'm happy to be corrected!)

If the follow things apply to you:

  • You have complex data structures that would be difficult to write destructors for (or where STL-style value semantics would be inappropriate, by design) so you need smart pointers to do it for you, and
  • You're using multiple threads that share these objects, and
  • You care about performance as well as correctness

... then actual garbage collection may be a better choice. Although GC has a bad reputation for performance, it's all relative. I believe it compares very favourably with locking smart pointers. It was an important part of why the CLR team chose true GC instead of something using reference counting. See this article, in particular this stark comparison of what reference assignment means if you have counting going on:

no ref-counting:

a = b;

ref counting:

if (a != null)
    if (InterlockedDecrement(ref a.m_ref) == 0)
            a.FinalRelease();

if (b != null)
    InterlockedIncrement(ref b.m_ref);

a = b;
Daniel Earwicker