views:

124

answers:

4

I'm using Valgrind --tool=drd to check my application that uses Boost::thread. Basically, the application populates a set of "Book" values with "Kehai" values based on inputs through a socket connection.

On a seperate thread, a user can connect and get the books send to them.

Its fairly simple, so i figured using a boost::mutex::scoped_lock on the location that serializes the book and the location that clears out the book data should be suffice to prevent any race conditions. Here is the code:

 void Book::clear()
    {
     boost::mutex::scoped_lock lock(dataMutex);
     for(int i =NUM_KEHAI-1; i >= 0; --i)
     {
      bid[i].clear();

      ask[i].clear();
     }
    }

    int Book::copyChangedKehaiToString(char* dst) const
    {
     boost::mutex::scoped_lock lock(dataMutex);

     sprintf(dst, "%-4s%-13s",market.c_str(),meigara.c_str());
     int loc = 17;
     for(int i = 0; i < Book::NUM_KEHAI; ++i)
     {
      if(ask[i].changed > 0)
      {
       sprintf(dst+loc,"A%i%-21s%-21s%-21s%-8s%-4s",i,ask[i].price.c_str(),ask[i].volume.c_str(),ask[i].number.c_str(),ask[i].postTime.c_str(),ask[i].status.c_str());
       loc += 77;
      }
     }
     for(int i = 0; i < Book::NUM_KEHAI; ++i)
     {
      if(bid[i].changed > 0)
      {
       sprintf(dst+loc,"B%i%-21s%-21s%-21s%-8s%-4s",i,bid[i].price.c_str(),bid[i].volume.c_str(),bid[i].number.c_str(),bid[i].postTime.c_str(),bid[i].status.c_str());
       loc += 77;
      }
     }

     return loc;
    }

The clear() function and the copyChangedKehaiToString() function are called in the datagetting thread and data sending thread,respectively. Also, as a note, the class Book:

    struct Book
    {
    private:
     Book(const Book&); Book& operator=(const Book&);
    public:

     static const int NUM_KEHAI=10;
     struct Kehai;
     friend struct Book::Kehai;

     struct Kehai
     {
     private:
       Kehai& operator=(const Kehai&);
     public:
      std::string price;
      std::string volume;
      std::string number;
      std::string postTime;
      std::string status;

      int changed;
      Kehai();
      void copyFrom(const Kehai& other);
      Kehai(const Kehai& other);
      inline void clear()
      {

       price.assign("");
       volume.assign("");
       number.assign("");
       postTime.assign("");
       status.assign("");
       changed = -1;
      }
     };

     std::vector<Kehai> bid;
     std::vector<Kehai> ask;
     tm recTime;
     mutable boost::mutex dataMutex;


     Book();
     void clear();
     int copyChangedKehaiToString(char * dst) const;
      };

When using valgrind --tool=drd, i get race condition errors such as the one below:

==26330== Conflicting store by thread 1 at 0x0658fbb0 size 4
==26330==    at 0x653AE68: std::string::_M_mutate(unsigned int, unsigned int, unsigned int) (in /usr/lib/libstdc++.so.6.0.8)
==26330==    by 0x653AFC9: std::string::_M_replace_safe(unsigned int, unsigned int, char const*, unsigned int) (in /usr/lib/libstdc++.so.6.0.8)
==26330==    by 0x653B064: std::string::assign(char const*, unsigned int) (in /usr/lib/libstdc++.so.6.0.8)
==26330==    by 0x653B134: std::string::assign(char const*) (in /usr/lib/libstdc++.so.6.0.8)
==26330==    by 0x8055D64: Book::Kehai::clear() (Book.h:50)
==26330==    by 0x8094A29: Book::clear() (Book.cpp:78)
==26330==    by 0x808537E: RealKernel::start() (RealKernel.cpp:86)
==26330==    by 0x804D15A: main (main.cpp:164)
==26330== Allocation context: BSS section of /usr/lib/libstdc++.so.6.0.8
==26330== Other segment start (thread 2)
==26330==    at 0x400BB59: pthread_mutex_unlock (drd_pthread_intercepts.c:633)
==26330==    by 0xC59565: pthread_mutex_unlock (in /lib/libc-2.5.so)
==26330==    by 0x805477C: boost::mutex::unlock() (mutex.hpp:56)
==26330==    by 0x80547C9: boost::unique_lock<boost::mutex>::~unique_lock() (locks.hpp:340)
==26330==    by 0x80949BA: Book::copyChangedKehaiToString(char*) const (Book.cpp:134)
==26330==    by 0x80937EE: BookSerializer::serializeBook(Book const&, std::string const&) (BookSerializer.cpp:41)
==26330==    by 0x8092D05: BookSnapshotManager::getSnaphotDataList() (BookSnapshotManager.cpp:72)
==26330==    by 0x8088179: SnapshotServer::getDataList() (SnapshotServer.cpp:246)
==26330==    by 0x808870F: SnapshotServer::run() (SnapshotServer.cpp:183)
==26330==    by 0x808BAF5: boost::_mfi::mf0<void, RealThread>::operator()(RealThread*) const (mem_fn_template.hpp:49)
==26330==    by 0x808BB4D: void boost::_bi::list1<boost::_bi::value<RealThread*> >::operator()<boost::_mfi::mf0<void, RealThread>, boost::_bi::list0>(boost::_bi::type<void>, boost::_mfi::mf0<void, RealThread>&, boost::_bi::list0&, int) (bind.hpp:253)
==26330==    by 0x808BB90: boost::_bi::bind_t<void, boost::_mfi::mf0<void, RealThread>, boost::_bi::list1<boost::_bi::value<RealThread*> > >::operator()() (bind_template.hpp:20)
==26330== Other segment end (thread 2)
==26330==    at 0x400B62A: pthread_mutex_lock (drd_pthread_intercepts.c:580)
==26330==    by 0xC59535: pthread_mutex_lock (in /lib/libc-2.5.so)
==26330==    by 0x80546B8: boost::mutex::lock() (mutex.hpp:51)
==26330==    by 0x805473B: boost::unique_lock<boost::mutex>::lock() (locks.hpp:349)
==26330==    by 0x8054769: boost::unique_lock<boost::mutex>::unique_lock(boost::mutex&) (locks.hpp:227)
==26330==    by 0x8094711: Book::copyChangedKehaiToString(char*) const (Book.cpp:113)
==26330==    by 0x80937EE: BookSerializer::serializeBook(Book const&, std::string const&) (BookSerializer.cpp:41)
==26330==    by 0x808870F: SnapshotServer::run() (SnapshotServer.cpp:183)
==26330==    by 0x808BAF5: boost::_mfi::mf0<void, RealThread>::operator()(RealThread*) const (mem_fn_template.hpp:49)
==26330==    by 0x808BB4D: void boost::_bi::list1<boost::_bi::value<RealThread*> >::operator()<boost::_mfi::mf0<void, RealThread>, boost::_bi::list0>(boost::_bi::type<void>, boost::_mfi::mf0<void, RealThread>&, boost::_bi::list0&, int) (bind.hpp:253)

For the life of me, i can't figure out where the race condition is. As far as I can tell, clearing the kehai is done only after having taken the mutex, and the same holds true with copying it to a string. Does anyone have any ideas what could be causing this, or where I should look?

Thank you kindly.

A: 

Nevermind. I'm an idiot and managed to forget that C++ strings are mutable. I changed the code to use c-style strings and my race condition issues went away.

On an aside for anyone reading this post, does anyone know a good immutable string library for C++? I thought boost had one, but i haven't been able to find anything conclusive on it.

Thanks.

Nik
I've worked with threads and std::strings (or STL containres..) without any problem if everything is correctly protected. What compiler/STL do you use?
Nikko
With gcc and linux I never had problems..
Nikko
its a known issue with C++ stl strings. Since they use reference counting and are mutable this issue seems to come up quite regularly.using gcc 4.1 on Centos 5
Nik
as per my comment on another response, you could force the STL to copy rather than clone [string2 = string1.c_str()]. you could even create a "thread-safe string holder" class that encapsulates this work-around and is used in all shared objects that store strings -- then it's harder to forget to do it right, and easier to change implementation if you switch to a non-borked STL.
chrispy
+4  A: 

After your post I took time to learn about Valgrind and how its output should be read.

I can see the following:

You invoke Book::clear which in turns calls Book::Kehai::clear, where you assign a value to a string. Inside the std::string::assign the STL does something which stores some value at the address 0x0658fbb0.

Meanwhile the other thread has accessed the same memory location, hence this situation is considered a race condition.

Now look at the "context" of the other thread. Valgrind doesn't show its exact stack location, however it shows between which "segments" it has occured. According to Valgrind a segment is a consecutive block of memory accesses bounded by synchronization operations.

We see that this block starts with pthread_mutex_unlock and ends at pthread_mutex_lock. Means - the same memory location was accessed when your mutex was not locked, and that thread was somewhere outside of your two functions.

Now, look at the conflicting memory location information:

Allocation context: BSS section of /usr/lib/libstdc++.so.6.0.8

The BSS means that it's a global/static variable. And it's defined somewhere inside the libstdc.

Conclusion:

This race condition has nothing to do with your data structures. It's related to STL. One thread performs something to an std::string (assigns it to an empty string to be exact), whereas the other thread probably does something STL-related as well.

BTW I remember several years ago I've written a multi-threaded application, and there were problems with std::string there. As I found out - the STL implementation (which was a Dunkimware) actually implemented string as reference-counted, whereas the reference counting was not thread-safe.

Maybe this is what happens to you as well? Perhaps you should set some compiler flag/option when building a multi-threaded application?

valdo
valdo, thanks for the update. Yeah, i think its the same exact problem. when I switched everything to C-style strings, the errors went away.
Nik
It's a useful post nevertheless. Now I think to test my apps with Valgrind, I use multi-threading heavily.About strings:You say the problem disappears when you switch to "C-style strings". But what you call the "C-style strings"? More precisely - what is your policy about their manipulation and lifetime.I believe the problem is related to the fact if your strings are reference-counted or not. Means - when you assign one string to another: do you actually create a copy of your string, or you allocate another one identical to that?And if it's reference-counted - is it thread-safe?
valdo
STL is supposed to be thread-safe in the sense that it is not a problem to use them with thread if you lock correctly or if you just perform multi-thread reads
Nikko
A: 

STL is supposed to be thread-safe in the sense that it is not a problem to use them with thread if you lock correctly or if you just perform multi-thread reads

Surpirse! Yes, it's supposed, but let me tell you a store about what actually happened.

I had a multit-hreaded application. There was a datastructure with strings (std::string). It was locked with a critical section.

Other objects eventually needed to take strings from there. They made a copy of those strings in the following manner:

// Take a string
std::string str;
{
    Autolock l(g_CritSect);
    str = g_SomeStr;
}

The same strategy was used to adjust those strings:

// Put a string
std::string str;
{
    Autolock l(g_CritSect);
    g_SomeStr = str;
}

And, guess what? Crashes!

But why? Because the string's assignment statement doesn't really produce a copy of the memory block holding the string. Instead the same memory block is reused (referenced).

Well, this is not necessarily bad. But the bad thing was that std::string implemented reference counting of the strings not in a thread-safe way. It used regular arithmetic ++ and -- instead of InterlockedIncrement and etc.

What happened is that the str object references the same string. And then it eventually dereferences it in its destructor (or when explicitly assigned to another string). And this happens outside of the locked region.

So that those strings may not be used in a multi-threaded application. And it's almost impossible to implement correct locking to work this around, because the actual referenced data passes silently from object to object.

And that implementation of STL was declared thread-safe.

valdo
Presumably you could force the copy to actually copy, e.g. g_SomeStr = str.c_str() ? [Obviously a horrible work-around, and better to switch to a non-borked STL.]
chrispy
Yes there are always bad implementation, but it's not always the case
Nikko
A: 

This report may be ignored safely. It is triggered by how std::string is implemented in libstdc++. This issue has been solved in the libstdc++ version included with gcc 4.4.4 or later. See also GCC bugzilla item #40518 for the details.