views:

439

answers:

2

I have written the followin atomic template with a view to mimicing the atomic operations which will be available in the upcoming c++0x standard.

However, I am not sure that the __sync_synchronize() call I have around the returning of the underlying value are necessary.

From my understanding, __sync_synchronize() is a full memory barrier and I'm not sure I need such a costly call when returning the object value.

I'm pretty sure it'll be needed around the setting of the value but I could also implement this with the assembly ..

__asm__ __volatile__ ( "rep;nop": : :"memory" );

Does anyone know wether I definitely need the synchronize() on return of the object.

M.

template < typename T >
struct atomic
{
private:
    volatile T obj;

public:
    atomic( const T & t ) :
        obj( t )
    {
    }

    inline operator T()
    {
        __sync_synchronize();   // Not sure this is overkill
        return obj;
    }

    inline atomic< T > & operator=( T val )
    {
        __sync_synchronize();   // Not sure if this is overkill
        obj = val;
        return *this;
    }

    inline T operator++()
    {
        return __sync_add_and_fetch( &obj, (T)1 );
    }

    inline T operator++( int )
    {
        return __sync_fetch_and_add( &obj, (T)1 );
    }

    inline T operator+=( T val )
    {
        return __sync_add_and_fetch( &obj, val );
    }

    inline T operator--()
    {
        return __sync_sub_and_fetch( &obj, (T)1 );
    }

    inline T operator--( int )
    {
        return __sync_fetch_and_sub( &obj, (T)1 );
    }

    inline T operator-=( T )
    {
        return __sync_sub_and_fetch( &obj, val );
    }

    // Perform an atomic CAS operation
    // returning the value before the operation
    inline T exchange( T oldVal, T newVal )
    {
        return __sync_val_compare_and_swap( &obj, oldval, newval );
    }

};

Update: I want to make sure that the operations are consistent in the face of read/write re-ordering due to compiler optimisations.

A: 
inline operator T()
{
    __sync_synchronize();   // Not sure this is overkill
    return obj;
}

Short version: This IS overkill.

Long version:

Why do you want to implement this class as a template at all? It doesn't make sense because the atomic operations are allowed only on integer types from 1-8 bytes and you even can't be sure that an 8 byte integer is supported on all platforms.

You should implement your atomic class as a non-template version and use the "native" integer type of your hardware/system. This is int32_t on 32 bit processors/os and int64_t on a 64 bit systems. e.g.:

#ifdef ...
typedef ... native_int_type;
#endif
// verify that you choosed the correct integer type
BOOST_STATIC_ASSERT(sizeof(native_int_type) == sizeof(void*));

BOOST_STATIC_ASSERT is straight forward to "static_assert()" from C++0x.

If you're using the "perfect fit" integer type you can write the operator just like that:

operator native_int_type() { return obj; }

Because obj is volatile it's guaranteed to fetch the value and not return any cached value. And because you're using the "native" integer type you can be sure that reading such a value is atomic.

atomic& operator=( native_integer_type val )

Again, you don't need a synchronize if you're using the right integer type. Read/Set an int32 on a intel 32 bit system is atomic and so is Read/Set an int64 on a 64 bit system.

I don't see any benefit from implementing atomic as a template. Atomic operations are platform dependent. It's better to offer an "atomic_int" class that just guarantees to has at least 4 byte (if you're supporting 32bit and 64bit systems) and an "atomic_pointer" if you need it. This way the name of the class also implies a semantic and a purpose.

If you just use "atomic" than one could think: "Wow, i just have to put my string class in this template and then it's thread safe!".


Edit: to Answer your update: "I want to make sure that the operations are consistent in the face of read/write re-ordering due to compiler optimisations."

To prevent compiler and cpu to reorder read/write operations you need the __sync_synchronize().

But note that acquire/release semantics may yield better performance than full barriers.


Edit2:

inline atomic< T > & operator=( T val )
{
    __sync_synchronize();   // Not sure if this is overkill
    obj = val;
    return *this;
}

What do you want prevent from reordering? In most cases you want to write this:

    obj = val;
    __sync_synchronize();

instead. Because you want to be sure that the value is written, once you returned from the function.

neverlord
Just because an operation is atomic doesn't mean memory barriers are unnecessary. Without them, the read/write might be cached and delayed for an unspecified amount of time. Or it might get reordered, changing the semantics of your code.
jalf
@neverlord. I agree with your appraisal of implementing the atomic ops as templates and the only reason I did was because, as I understand, the c++0x defines it's atomics as "atomic< int >" etc. I agree that someone looking at the code may well think that "atomic< string >" is valid. However, I was going to make sure that an type handed in for "T" would be integer type and be within the range allowed for the host. I just hadn't got that far yet :o)
ScaryAardvark
@jalf. This is what I was driving at. I want to make sure that the value is consistent in the face of optimisations. Are you saying that this extra "__synchronize()" is necessary ?
ScaryAardvark
@jalf. Read/write operations on volatile variables are not cached. But reordering might be a problem under some circumstances. I think a good way to handle this is to provide several implementations of atomic operations with different reordering semantics like QT does: http://doc.trolltech.com/4.6/qatomicint.htmlBut you don't need __sync_synchronize() to verify that read/write is atomic in this case.
neverlord
I would say so, yes. Without it, the compiler could inline everything in the class, and then reorder operations which might change the behavior during concurrent accesses to the class. Or the CPU might cache the read/write ops even if the compiler does the right thing. It might depend on whether the other __sync primitives also define a memory barrier implicitly. If they do, it *might* be unnecessary. Check the docs, I guess. :)
jalf
I believe, std::atomic<std::string> is valid in c++0x. std::atomic<> should fall back to mutexes when the CPU lacks the instructions (every cpu will lack instructions of atomic string operations).
caspin
@neverlord. Reads/writes on x86 hardware are not always guaranteed to be atomic. If the integer is on unaligned memory then it will certainly not be an atomic operation.
ScaryAardvark
+1  A: 

First, some petty remarks:

volatile T obj;

volatile is useless here, even more that you make all the barriers yourself.

inline T operator++( int )

inline is unneeded, since it is implied when the method is defined inside the class.

Getters and setters:

inline operator T()
{
    __sync_synchronize();   // (I)
    T tmp=obj;
    __sync_synchronize();   // (II)
    return tmp;
}

inline atomic< T > & operator=( T val )
{
    __sync_synchronize();   // (III)
    obj = val;
    __sync_synchronize();   // (IV)
    return *this;
}

To assure total ordering of the memory accesses on read and write, you need two barriers on each access (like this). I would be happy with only barriers (II) and (III) as they suffice for some uses I came up with (eg. pointer/boolean saying data is there, spinlock), but, unless specified otherwise, I would not omit the others, because someone might need them (it would be nice if someone showed you can omit some of the barriers without restricting possible uses, but I don't think it's possible).

Of course, this would be unnecessarily complicated and slow.

That said, I would just dump the barriers, and even the idea of using the barriers in any place of a similar template. Note that:

  • the ordering semantics of that interface is all defined by you; and if you decide the interface has the barriers here or there, they must be here or there, period. If you don't define it, you can come up with more efficient design, because not all barriers, or even not full barriers, might be needed for a particular problem.
  • usually, you use atomics if you have a lock-free algorithm that could give you a performance advantage; this means an interface that prematurely pessimizes the accesses will probably be unusable as a building block of it, as it will hamper the performance itself.
  • lock-free algorithms typically contain communication that cannot be encapsulated by one atomic data type, so you need to know what's happening in the algorithm to place the barriers precisely where they belong (eg. when implementing a lock, you need a barrier after you've acquired it, but before you release it, which are both writes, at least in principle)
  • if you don't wanna have problems, and are not sure about placing the barriers explicitly in the algorithm, just use lock-based algorithms. There's nothing bad about it.

BTW, the c++0x interface allows you to specify precise memory ordering constraints.

jpalecek