views:

401

answers:

6

Hello everyone,

I have currently created a C++ class for a thread safe integer which simply stores an integer privately and has public get a set functions which use a boost::mutex to ensure that only one change at a time can be applied to the integer.

Is this the most efficient way to do it, I have been informed that mutexes are quite resource intensive? The class is used a lot, very rapidly so it could well be a bottleneck...

Googleing C++ Thread Safe Integer returns unclear views and oppinions on the thread safety of integer operations on different architectures.

Some say that a 32bit int on a 32bit arch is safe, but 64 on 32 isn't due to 'alignment' Others say it is compiler/OS specific (which I don't doubt).

I am using Ubuntu 9.10 on 32 bit machines, some have dual cores and so threads may be executed simultaneously on different cores in some cases and I am using GCC 4.4's g++ compiler.

Thanks in advance...

Please Note: The answer I have marked as 'correct' was most suitable for my problem - however there are some excellent points made in the other answers and they are all worth reading!

+4  A: 

C++ has no real atomic integer implementation, neither do most common libraries.

Consider the fact that even if said implementation would exist, it would have to rely on some sort of mutex - due to the fact that you cannot guarantee atomic operations across all architectures.

Yuval A
+6  A: 

There is the C++0x atomic library, and there is also a Boost.Atomic library under development that use lock free techniques.

Vicente Botet Escriba
+4  A: 

As you're using GCC, and depending on what operations you want to perform on the integer, you might get away with GCC's atomic builtins.

These might be a bit faster than mutexes, but in some cases still a lot slower than "normal" operations.

janneb
+3  A: 

It's not compiler and OS specific, it's architecture specific. The compiler and OS come into it because they're the tools you work through, but they're not the ones setting the real rules. This is why the C++ standard won't touch the issue.

I have never in my life heard of an 64-bit integer write, which can be split into two 32-bit writes, being interrupted halfway through. (Yes, that's an invitation to others to post counterexamples.) Specifically, I have never heard of a CPU's load/store unit allowing a misaligned write to be interrupted; an interrupting source has to wait for the whole misaligned access to complete.

To have an interruptible load/store unit, its state would have to be saved to the stack... and the load/store unit is what saves the rest of the CPU's state to the stack. This would be hugely complicated, and bug prone, if the load/store unit were interruptible... and all that you would gain is one cycle less latency in responding to interrupts, which, at best, is measured in tens of cycles. Totally not worth it.

Back in 1997, A coworker and I wrote a C++ Queue template which was used in a multiprocessing system. (Each processor had its own OS running, and its own local memory, so these queues were only needed for memory shared between processors.) We worked out a way to make the queue change state with a single integer write, and treated this write as an atomic operation. Also, we required that each end of the queue (i.e. the read or write index) be owned by one and only one processor. Thirteen years later, the code is still running fine, and we even have a version that handles multiple readers.

Still, if you want to treat a 64-bit integer write as atomic, align the field to a 64-bit bound. Why worry?

EDIT: For the case you mention in your comment, I'd need more information to be sure, so let me give an example of something that could be implemented without specialized synchronization code.

Suppose you have N writers and one reader. You want the writers to be able to signal events to the reader. The events themselves have no data; you just want an event count, really.

Declare a structure for the shared memory, shared between all writers and the reader:

#include <stdint.h>
struct FlagTable
{   uint32_t flag[NWriters];
};

(Make this a class or template or whatever as you see fit.)

Each writer needs to be told its index and given a pointer to this table:

class Writer
{public:
    Writer(FlagTable* flags_, size_t index_): flags(flags_), index(index_) {}
    void SignalEvent(uint32_t eventCount = 1);
private:
    FlagTable* flags;
    size_t index;
}

When the writer wants to signal an event (or several), it updates its flag:

void Writer::SignalEvent(uint32_t eventCount)
{   // Effectively atomic: only one writer modifies this value, and
    // the state changes when the incremented value is written out.
    flags->flag[index] += eventCount;
}

The reader keeps a local copy of all the flag values it has seen:

class Reader
{public:
    Reader(FlagTable* flags_): flags(flags_)
    {   for(size_t i = 0; i < NWriters; ++i)
            seenFlags[i] = flags->flag[i];
    }
    bool AnyEvents(void);
    uint32_t CountEvents(int writerIndex);
private:
    FlagTable* flags;
    uint32_t seenFlags[NWriters];
}

To find out if any events have happened, it just looks for changed values:

bool Reader::AnyEvents(void)
{   for(size_t i = 0; i < NWriters; ++i)
        if(seenFlags[i] != flags->flag[i])
            return true;
    return false;
}

If something happened, we can check each source and get the event count:

uint32_t Reader::CountEvents(int writerIndex)
{   // Only read a flag once per function call.  If you read it twice,
    // it may change between reads and then funny stuff happens.
    uint32_t newFlag = flags->flag[i];
    // Our local copy, though, we can mess with all we want since there
    // is only one reader.
    uint32_t oldFlag = seenFlags[i];
    // Next line atomically changes Reader state, marking the events as counted.
    seenFlags[i] = newFlag;
    return newFlag - oldFlag;
}

Now the big gotcha in all this? It's nonblocking, which is to say that you can't make the Reader sleep until a Writer writes something. The Reader has to choose between sitting in a spin-loop waiting for AnyEvents() to return true, which minimizes latency, or it can sleep a bit each time through, which saves CPU but could let a lot of events build up. So it's better than nothing, but it's not the solution to everything.

Using actual synchronization primitives, one would only need to wrap this code with a mutex and condition variable to make it properly blocking: the Reader would sleep until there was something to do. Since you used atomic operations with the flags, you could actually keep the amount of time the mutex is locked to a minimum: the Writer would only need to lock the mutex long enough to send the condition, and not set the flag, and the reader only needs to wait for the condition before calling AnyEvents() (basically, it's like the sleep-loop case above, but with a wait-for-condition instead of a sleep call).

Mike D.
I am currently using 32bit ints but could you elaborate on what you mean by "align the field to a 64-bit bound".Don't know if this clarifies my needs but: there are multiple threads writing (incrementing) and one reader.
Paul Ridgway
If you have a 64-bit integer, then you want to align it to a 64-bit boundary, i.e. a byte address evenly divisible by `sizeof(uint64_t)` (or whatever you're trying to align). See http://en.wikipedia.org/wiki/Data_structure_alignment for more details.
Mike D.
Sweet... I expanded my answer and got downvoted without explanation.
Mike D.
With more detail this is easier to understand and implement - thanks!
Paul Ridgway
+3  A: 

For full, general purpose synchronization, as others have already mentioned, the traditional synchronization tools are pretty much required. However, for certain special cases it is possible to take advantage of hardware optimizations. Specifically, most modern CPUs support atomic increment & decrement on integers. The GLib library has pretty good cross-platform support for this. Essentially, the library wraps CPU & compiler specific assembly code for these operations and defaults to mutex protection where they're not available. It's certainly not very general-purpose but if you're only interested in maintaining a counter, this might be sufficient.

Rakis
+2  A: 

you can also have a look at the atomic ops section of intels Thread Building Blocks or the atomic_ops project

Necrolis