views:

1753

answers:

9

Edit: The code here still has some bugs in it, and it could do better in the performance department, but instead of trying to fix this, for the record I took the problem over to the Intel discussion groups and got lots of great feedback, and if all goes well a polished version of Atomic float will be included in a near future release of Intel's Threading Building Blocks

Ok here's a tough one, I want an Atomic float, not for super-fast graphics performance, but to use routinely as data-members of classes. And I don't want to pay the price of using locks on these classes, because it provides no additional benefits for my needs.

Now with intel's tbb and other atomic libraries I've seen, integer types are supported, but not floating points. So I went on and implemented one, and it works... but I'm not sure if it REALLY works, or I'm just very lucky that it works.

Anyone here knows if this is not some form of threading heresy?

typedef unsigned int uint_32;

  struct AtomicFloat
  {
    private:
    tbb::atomic<uint_32> atomic_value_;

    public:
    template<memory_semantics M>
    float fetch_and_store( float value ) 
    {
     const uint_32 value_ = atomic_value_.tbb::atomic<uint_32>::fetch_and_store<M>((uint_32&)value);
     return reinterpret_cast<const float&>(value_);
    }

    float fetch_and_store( float value ) 
    {
     const uint_32 value_ = atomic_value_.tbb::atomic<uint_32>::fetch_and_store((uint_32&)value);
     return reinterpret_cast<const float&>(value_);
    }

    template<memory_semantics M>
    float compare_and_swap( float value, float comparand ) 
    {
     const uint_32 value_ = atomic_value_.tbb::atomic<uint_32>::compare_and_swap<M>((uint_32&)value,(uint_32&)compare);
     return reinterpret_cast<const float&>(value_);
    }

    float compare_and_swap(float value, float compare)
    {
     const uint_32 value_ = atomic_value_.tbb::atomic<uint_32>::compare_and_swap((uint_32&)value,(uint_32&)compare);
     return reinterpret_cast<const float&>(value_);
    }

    operator float() const volatile // volatile qualifier here for backwards compatibility 
    {
     const uint_32 value_ = atomic_value_;
     return reinterpret_cast<const float&>(value_);
    }

    float operator=(float value)
    {
     const uint_32 value_ = atomic_value_.tbb::atomic<uint_32>::operator =((uint_32&)value);
     return reinterpret_cast<const float&>(value_);
    }

    float operator+=(float value)
    {
     volatile float old_value_, new_value_;
     do
     {
      old_value_ = reinterpret_cast<float&>(atomic_value_);
      new_value_ = old_value_ + value;
     } while(compare_and_swap(new_value_,old_value_) != old_value_);
     return (new_value_);
    }

    float operator*=(float value)
    {
     volatile float old_value_, new_value_;
     do
     {
      old_value_ = reinterpret_cast<float&>(atomic_value_);
      new_value_ = old_value_ * value;
     } while(compare_and_swap(new_value_,old_value_) != old_value_);
     return (new_value_);
    }

    float operator/=(float value)
    {
     volatile float old_value_, new_value_;
     do
     {
      old_value_ = reinterpret_cast<float&>(atomic_value_);
      new_value_ = old_value_ / value;
     } while(compare_and_swap(new_value_,old_value_) != old_value_);
     return (new_value_);
    }

    float operator-=(float value)
    {
     return this->operator+=(-value);
    }

    float operator++() 
    {
     return this->operator+=(1);
    }

    float operator--() 
    {
     return this->operator+=(-1);
    }

    float fetch_and_add( float addend ) 
    {
     return this->operator+=(-addend);
    }

    float fetch_and_increment() 
    {
     return this->operator+=(1);
    }

    float fetch_and_decrement() 
    {
     return this->operator+=(-1);
    }
   };

Thanks!

Edit: changed size_t to uint32_t as Greg Rogers suggested, that way its more portable

Edit: added listing for the entire thing, with some fixes.

More Edits: Performance wise using a locked float for 5.000.000 += operations with 100 threads on my machine takes 3.6s, while my atomic float even with its silly do-while takes 0.2s to do the same work. So the >30x performance boost means its worth it, (and this is the catch) if its correct.

Even More Edits: As Awgn pointed out my fetch_and_xxxx parts were all wrong. Fixed that and removed parts of the API I'm not sure about (templated memory models). And implemented other operations in terms of operator += to avoid code repetition

Added: Added operator *= and operator /=, since floats wouldn't be floats without them. Thanks to Peterchen's comment that this was noticed

Edit: Latest version of the code follows (I'll leave the old version for reference though)

  #include <tbb/atomic.h>
  typedef unsigned int   uint_32;
  typedef __TBB_LONG_LONG    uint_64;

  template<typename FLOATING_POINT,typename MEMORY_BLOCK>
  struct atomic_float_
  {
    /* CRC Card -----------------------------------------------------
    | Class:   atmomic float template class
    |
    | Responsability: handle integral atomic memory as it were a float,
    |     but partially bypassing FPU, SSE/MMX, so it is
    |     slower than a true float, but faster and smaller
    |     than a locked float.
    |      *Warning* If your float usage is thwarted by
    |     the A-B-A problem this class isn't for you
    |      *Warning* Atomic specification says we return,
    |     values not l-values. So  (i = j) = k doesn't work.
    |
    | Collaborators: intel's tbb::atomic handles memory atomicity
    ----------------------------------------------------------------*/
    typedef typename atomic_float_<FLOATING_POINT,MEMORY_BLOCK> self_t;

    tbb::atomic<MEMORY_BLOCK> atomic_value_;

    template<memory_semantics M>
    FLOATING_POINT fetch_and_store( FLOATING_POINT value ) 
    {
     const MEMORY_BLOCK value_ = 
      atomic_value_.tbb::atomic<MEMORY_BLOCK>::fetch_and_store<M>((MEMORY_BLOCK&)value);
     //atomic specification requires returning old value, not new one
     return reinterpret_cast<const FLOATING_POINT&>(value_);
    }

    FLOATING_POINT fetch_and_store( FLOATING_POINT value ) 
    {
     const MEMORY_BLOCK value_ = 
      atomic_value_.tbb::atomic<MEMORY_BLOCK>::fetch_and_store((MEMORY_BLOCK&)value);
     //atomic specification requires returning old value, not new one
     return reinterpret_cast<const FLOATING_POINT&>(value_);
    }

    template<memory_semantics M>
    FLOATING_POINT compare_and_swap( FLOATING_POINT value, FLOATING_POINT comparand ) 
    {
     const MEMORY_BLOCK value_ = 
      atomic_value_.tbb::atomic<MEMORY_BLOCK>::compare_and_swap<M>((MEMORY_BLOCK&)value,(MEMORY_BLOCK&)compare);
     //atomic specification requires returning old value, not new one
     return reinterpret_cast<const FLOATING_POINT&>(value_);
    }

    FLOATING_POINT compare_and_swap(FLOATING_POINT value, FLOATING_POINT compare)
    {
     const MEMORY_BLOCK value_ = 
      atomic_value_.tbb::atomic<MEMORY_BLOCK>::compare_and_swap((MEMORY_BLOCK&)value,(MEMORY_BLOCK&)compare);
     //atomic specification requires returning old value, not new one
     return reinterpret_cast<const FLOATING_POINT&>(value_);
    }

    operator FLOATING_POINT() const volatile // volatile qualifier here for backwards compatibility 
    {
     const MEMORY_BLOCK value_ = atomic_value_;
     return reinterpret_cast<const FLOATING_POINT&>(value_);
    }

    //Note: atomic specification says we return the a copy of the base value not an l-value
    FLOATING_POINT operator=(FLOATING_POINT rhs) 
    {
     const MEMORY_BLOCK value_ = atomic_value_.tbb::atomic<MEMORY_BLOCK>::operator =((MEMORY_BLOCK&)rhs);
     return reinterpret_cast<const FLOATING_POINT&>(value_);
    }

    //Note: atomic specification says we return an l-value when operating among atomics
    self_t& operator=(self_t& rhs) 
    {
     const MEMORY_BLOCK value_ = atomic_value_.tbb::atomic<MEMORY_BLOCK>::operator =((MEMORY_BLOCK&)rhs);
     return *this;
    }

    FLOATING_POINT& _internal_reference() const
    {
     return reinterpret_cast<FLOATING_POINT&>(atomic_value_.tbb::atomic<MEMORY_BLOCK>::_internal_reference());
    }

    FLOATING_POINT operator+=(FLOATING_POINT value)
    {
     FLOATING_POINT old_value_, new_value_;
     do
     {
      old_value_ = reinterpret_cast<FLOATING_POINT&>(atomic_value_);
      new_value_ = old_value_ + value;
     //floating point binary representation is not an issue because
     //we are using our self's compare and swap, thus comparing floats and floats
     } while(self_t::compare_and_swap(new_value_,old_value_) != old_value_);
     return (new_value_); //return resulting value
    }

    FLOATING_POINT operator*=(FLOATING_POINT value)
    {
     FLOATING_POINT old_value_, new_value_;
     do
     {
      old_value_ = reinterpret_cast<FLOATING_POINT&>(atomic_value_);
      new_value_ = old_value_ * value;
     //floating point binary representation is not an issue becaus
     //we are using our self's compare and swap, thus comparing floats and floats
     } while(self_t::compare_and_swap(new_value_,old_value_) != old_value_);
     return (new_value_); //return resulting value
    }

    FLOATING_POINT operator/=(FLOATING_POINT value)
    {
     FLOATING_POINT old_value_, new_value_;
     do
     {
      old_value_ = reinterpret_cast<FLOATING_POINT&>(atomic_value_);
      new_value_ = old_value_ / value;
     //floating point binary representation is not an issue because
     //we are using our self's compare and swap, thus comparing floats and floats
     } while(self_t::compare_and_swap(new_value_,old_value_) != old_value_);
     return (new_value_); //return resulting value
    }

    FLOATING_POINT operator-=(FLOATING_POINT value)
    {
     return this->operator+=(-value); //return resulting value
    }

    //Prefix operator
    FLOATING_POINT operator++()
    {
     return this->operator+=(1); //return resulting value
    }

    //Prefix operator
    FLOATING_POINT operator--() 
    {
     return this->operator+=(-1); //return resulting value
    }

    //Postfix operator
    FLOATING_POINT operator++(int)
    {
     const FLOATING_POINT temp = this;
     this->operator+=(1);
     return temp//return resulting value
    }

    //Postfix operator
    FLOATING_POINT operator--(int) 
    {
     const FLOATING_POINT temp = this;
     this->operator+=(1);
     return temp//return resulting value
    }

    FLOATING_POINT fetch_and_add( FLOATING_POINT addend ) 
    {
     const FLOATING_POINT old_value_ = atomic_value_;
     this->operator+=(addend);
     //atomic specification requires returning old value, not new one as in operator x=
     return old_value_; 
    }

    FLOATING_POINT fetch_and_increment() 
    {
     const FLOATING_POINT old_value_ = atomic_value_;
     this->operator+=(+1);
     //atomic specification requires returning old value, not new one as in operator x=
     return old_value_; 
    }

    FLOATING_POINT fetch_and_decrement() 
    {
     const FLOATING_POINT old_value_ = atomic_value_;
     this->operator+=(-1);
     //atomic specification requires returning old value, not new one as in operator x=
     return old_value_; 
    }
  };

  typedef atomic_float_<float,uint_32> AtomicFloat;
  typedef atomic_float_<double,uint_64> AtomicDouble;
+3  A: 

It looks like your implementation assumes that sizeof(size_t) == sizeof(float). Will that always be true for your target platforms?

And I wouldn't say threading heresy so much as casting heresy. :)

Greg Hewgill
Well not necessarily, but I plan on putting a static assert that compares sizeof(float) == sizeof(size_t) as a guard for compilation
Robert Gould
What does that gain you over just using uint32_t?
Greg Rogers
Good point my friend!
Robert Gould
It looks like your implementation assumes that sizeof(uint32_t) == sizeof(float). Will that always be true for your target platforms? Will that always be true for your compilers?
Windows programmer
It's probably good enough for his current platform, if there are any future platforms, but using a static assert will let him know when that isn't the case, and if he wants to get really clever he can probably do different MACRO defines for different platforms.
Dan
+5  A: 

I would seriously advise against public inheritance. I don't know what the atomic implementation is like, but im assuming it has overloaded operators that use it as the integral type, which means that those promotions will be used instead of your float in many (maybe most?) cases.

I don't see any reason why that wouldn't work, but like you I have to way to prove that...

One note: your operator float() routine does not have load-acquire semantics, and shouldn't it be marked const volatile (or definitely at least const)?

EDIT: If you are going to provide operator--() you should provide both prefix/postfix forms.

Greg Rogers
Doing composition is probably the better solution. I should probably refactor the class if the implementation is ok.
Robert Gould
Fully agree with intheritance - composition.
xtofl
A: 

From my reading of that code, I would be really mad at such a compiler as to put out assembly for this that wasn't atomic.

Joshua
A: 

Have your compiler generate assembly code and take a look at it. If the operation is more than a single assembly-language instruction, then it's not an atomic operation, and requires locks to operate properly in multiprocessor systems.

Unfortunately, I'm not certain that the opposite is also true -- that single-instruction operations are guaranteed to be atomic. I don't know the details of multiprocessor programming down to that level. I could make a case for either result. (If anyone else has some definitive information on that, feel free to chime in.)

Head Geek
Single ASM instructions should be considered non-atomic until proven otherwise, especially on x86 and other CISCy architectures, since an instruction is broken down into micro-ops, betwixt which you might have a context switch. Atomic insns like compare-and-swap disable interrupts to elide this.
Matt J
Single assembly language instructions are non-atomic in multprocessor systems regardless of whether any of the processors does a context switch. The way to obtain atomicity is to use operations that are specially designed for it, such as compare-and-swap, or lock, or Dekker's algorithm.
Windows programmer
Of course, in a multiprocessor system, the context switch itself is irrelevant, but the fact that you should examine every possible interleaving of thread execution doesn't change whether multiple threads are arbitrarily time-multiplexed onto a core, or time-multiplexed into shared memory.
Matt J
A: 

This is the state of the code as it stands now after talks on the intel boards, but still hasn't been thoroughly verified to work correctly in all scenarios.

  #include <tbb/atomic.h>
  typedef unsigned int   uint_32;
  typedef __TBB_LONG_LONG    uint_64;

  template<typename FLOATING_POINT,typename MEMORY_BLOCK>
  struct atomic_float_
  {
    /* CRC Card -----------------------------------------------------
    | Class:   atmomic float template class
    |
    | Responsability: handle integral atomic memory as it were a float,
    |     but partially bypassing FPU, SSE/MMX, so it is
    |     slower than a true float, but faster and smaller
    |     than a locked float.
    |      *Warning* If your float usage is thwarted by
    |     the A-B-A problem this class isn't for you
    |      *Warning* Atomic specification says we return,
    |     values not l-values. So  (i = j) = k doesn't work.
    |
    | Collaborators: intel's tbb::atomic handles memory atomicity
    ----------------------------------------------------------------*/
    typedef typename atomic_float_<FLOATING_POINT,MEMORY_BLOCK> self_t;

    tbb::atomic<MEMORY_BLOCK> atomic_value_;

    template<memory_semantics M>
    FLOATING_POINT fetch_and_store( FLOATING_POINT value ) 
    {
     const MEMORY_BLOCK value_ = 
      atomic_value_.tbb::atomic<MEMORY_BLOCK>::fetch_and_store<M>((MEMORY_BLOCK&)value);
     //atomic specification requires returning old value, not new one
     return reinterpret_cast<const FLOATING_POINT&>(value_);
    }

    FLOATING_POINT fetch_and_store( FLOATING_POINT value ) 
    {
     const MEMORY_BLOCK value_ = 
      atomic_value_.tbb::atomic<MEMORY_BLOCK>::fetch_and_store((MEMORY_BLOCK&)value);
     //atomic specification requires returning old value, not new one
     return reinterpret_cast<const FLOATING_POINT&>(value_);
    }

    template<memory_semantics M>
    FLOATING_POINT compare_and_swap( FLOATING_POINT value, FLOATING_POINT comparand ) 
    {
     const MEMORY_BLOCK value_ = 
      atomic_value_.tbb::atomic<MEMORY_BLOCK>::compare_and_swap<M>((MEMORY_BLOCK&)value,(MEMORY_BLOCK&)compare);
     //atomic specification requires returning old value, not new one
     return reinterpret_cast<const FLOATING_POINT&>(value_);
    }

    FLOATING_POINT compare_and_swap(FLOATING_POINT value, FLOATING_POINT compare)
    {
     const MEMORY_BLOCK value_ = 
      atomic_value_.tbb::atomic<MEMORY_BLOCK>::compare_and_swap((MEMORY_BLOCK&)value,(MEMORY_BLOCK&)compare);
     //atomic specification requires returning old value, not new one
     return reinterpret_cast<const FLOATING_POINT&>(value_);
    }

    operator FLOATING_POINT() const volatile // volatile qualifier here for backwards compatibility 
    {
     const MEMORY_BLOCK value_ = atomic_value_;
     return reinterpret_cast<const FLOATING_POINT&>(value_);
    }

    //Note: atomic specification says we return the a copy of the base value not an l-value
    FLOATING_POINT operator=(FLOATING_POINT rhs) 
    {
     const MEMORY_BLOCK value_ = atomic_value_.tbb::atomic<MEMORY_BLOCK>::operator =((MEMORY_BLOCK&)rhs);
     return reinterpret_cast<const FLOATING_POINT&>(value_);
    }

    //Note: atomic specification says we return an l-value when operating among atomics
    self_t& operator=(self_t& rhs) 
    {
     const MEMORY_BLOCK value_ = atomic_value_.tbb::atomic<MEMORY_BLOCK>::operator =((MEMORY_BLOCK&)rhs);
     return *this;
    }

    FLOATING_POINT& _internal_reference() const
    {
     return reinterpret_cast<FLOATING_POINT&>(atomic_value_.tbb::atomic<MEMORY_BLOCK>::_internal_reference());
    }

    FLOATING_POINT operator+=(FLOATING_POINT value)
    {
     FLOATING_POINT old_value_, new_value_;
     do
     {
      old_value_ = reinterpret_cast<FLOATING_POINT&>(atomic_value_);
      new_value_ = old_value_ + value;
     //floating point binary representation is not an issue because
     //we are using our self's compare and swap, thus comparing floats and floats
     } while(self_t::compare_and_swap(new_value_,old_value_) != old_value_);
     return (new_value_); //return resulting value
    }

    FLOATING_POINT operator*=(FLOATING_POINT value)
    {
     FLOATING_POINT old_value_, new_value_;
     do
     {
      old_value_ = reinterpret_cast<FLOATING_POINT&>(atomic_value_);
      new_value_ = old_value_ * value;
     //floating point binary representation is not an issue becaus
     //we are using our self's compare and swap, thus comparing floats and floats
     } while(self_t::compare_and_swap(new_value_,old_value_) != old_value_);
     return (new_value_); //return resulting value
    }

    FLOATING_POINT operator/=(FLOATING_POINT value)
    {
     FLOATING_POINT old_value_, new_value_;
     do
     {
      old_value_ = reinterpret_cast<FLOATING_POINT&>(atomic_value_);
      new_value_ = old_value_ / value;
     //floating point binary representation is not an issue because
     //we are using our self's compare and swap, thus comparing floats and floats
     } while(self_t::compare_and_swap(new_value_,old_value_) != old_value_);
     return (new_value_); //return resulting value
    }

    FLOATING_POINT operator-=(FLOATING_POINT value)
    {
     return this->operator+=(-value); //return resulting value
    }

    //Prefix operator
    FLOATING_POINT operator++()
    {
     return this->operator+=(1); //return resulting value
    }

    //Prefix operator
    FLOATING_POINT operator--() 
    {
     return this->operator+=(-1); //return resulting value
    }

    //Postfix operator
    FLOATING_POINT operator++(int)
    {
     const FLOATING_POINT temp = this;
     this->operator+=(1);
     return temp//return resulting value
    }

    //Postfix operator
    FLOATING_POINT operator--(int) 
    {
     const FLOATING_POINT temp = this;
     this->operator+=(1);
     return temp//return resulting value
    }

    FLOATING_POINT fetch_and_add( FLOATING_POINT addend ) 
    {
     const FLOATING_POINT old_value_ = atomic_value_;
     this->operator+=(addend);
     //atomic specification requires returning old value, not new one as in operator x=
     return old_value_; 
    }

    FLOATING_POINT fetch_and_increment() 
    {
     const FLOATING_POINT old_value_ = atomic_value_;
     this->operator+=(+1);
     //atomic specification requires returning old value, not new one as in operator x=
     return old_value_; 
    }

    FLOATING_POINT fetch_and_decrement() 
    {
     const FLOATING_POINT old_value_ = atomic_value_;
     this->operator+=(-1);
     //atomic specification requires returning old value, not new one as in operator x=
     return old_value_; 
    }
  };

  typedef atomic_float_<float,uint_32> AtomicFloat;
  typedef atomic_float_<double,uint_64> AtomicDouble;
Robert Gould
A: 

Either you or I have some studying to do about references to objects that were formerly on the stack.

Windows programmer
Probably both of us :)
Robert Gould
I get downvoted for it and you don't. Until we study it, what would happen if you change your casts in "return reinterpret_cast<const float" to lose the ampersands?
Windows programmer
I'll up vote you, because this is not a silly point. However I suspect that you might have been down voted because this should be a comment and not an answer?As for your question if I loose the ampersands all hell breaks loose. They are there to interpret the memory as float, not just as refs
Robert Gould
+1  A: 

Although the size of a uint32_t may be equivalent to that of a float on a given arch, by reinterpreting a cast from one into the other you are implicitly assuming that atomic increments, decrements and all the other operations on bits are semantically equivalent on both types, which are not in reality. I doubt it works as expected.

Nicola Bonelli
No, no I'm not, that's why I'm pulling the actual operations out into a transaction-while-loop (a known parallel pattern). Anyways I can assure you the code works correctly in a single thread. And it even has been working correctly in multithread. Just not sure if that's something I can trust..
Robert Gould
I didn't pay much attention to the operators. But the question is: are you sure that fetch_and_add, fetch_and_increment etc. are working in the right way ?
Nicola Bonelli
You are right! I hadn't actually given much thought to them since I was testing the operators. The fetch_xxxx are all wrong! silly I missed that, they need the same treatment of the operators.
Robert Gould
+1  A: 

I strongly doubt that you get the correct values in fetch_and_add etc, as float addition is different from int addition.

Here's what I get from these arithmetics:

1   + 1    =  1.70141e+038  
100 + 1    = -1.46937e-037  
100 + 0.01 =  1.56743e+038  
23  + 42   = -1.31655e-036

So yeah, threadsafe but not what you expect.

the lock-free algorithms (operator + etc.) should work regarding atomicity (haven't checked for the algorithm itself..)


Other solution: As it is all additions and subtractions, you might be able to give every thread its own instance, then add the results from multiple threads.

peterchen
Note I'm not doing that. I'm casting the ints into float refs, which means they are handled correctly. old_value_ = reinterpret_cast<float new_value_ = old_value_ + value;
Robert Gould
That would be fine for a solution for a "reduce", but I need floats as members of data structures (properties) that have long lives.But your comment does remind me floats are silly without multiplication and division. Gonna add those too
Robert Gould
The revised code looks much better! :) And yes, the lock-free loops look ok to me, but I haven't done enough with those to really judge.
peterchen
+1  A: 

Just a note about this (I wanted to make a comment but apparently new users aren't allowed to comment): Using reinterpret_cast on references produces incorrect code with gcc 4.1 -O3. This seems to be fixed in 4.4 because there it works. Changing the reinterpret_casts to pointers, while slightly uglier, works in both cases.

Lutorm