views:

681

answers:

4

I'm developing a multithreaded program running on Linux (compiled with G++ 4.3) and if you search around for a bit you find a lot of scary stories about std::string not being thread-safe with GCC. This is supposedly due to the fact that internally it uses copy-on-write which wreaks havoc with tools like Helgrind.

I've made a small program that copies one string to another string and if you inspect both strings they both share the same internal _M_p pointer. When one string is modified the pointer changes so the copy-on-write stuff is working fine.

What I'm worried about though is what happens if I share a string between two threads (for instance passing it as an object in a threadsafe dataqueue between two threads). I've already tried compiling with the '-pthread' option but that does not seem to make much difference. So my question:

  • Is there any way to force std::string to be threadsafe? I would not mind if the copy-on-write behaviour was disabled to achieve this.
  • How have other people solved this? Or am I being paranoid?

I can't seem to find a definitive answer so I hope you guys can help me..

Edit:

Wow, that's a whole lot of answers in such a short time. Thank you! I will definitely use Jack's solution when I want to disable COW. But now the main question becomes: do I really have to disable COW? Or is the 'bookkeeping' done for COW thread safe? I'm currently browsing the libstdc++ sources but that's going to take quite some time to figure out...

Edit 2

OK browsed the libstdc++ source code and I find something like this in libstd++-v3/include/bits/basic_string.h:

  _CharT*
   _M_refcopy() throw()
   {
#ifndef _GLIBCXX_FULLY_DYNAMIC_STRING
     if (__builtin_expect(this != &_S_empty_rep(), false))
#endif
            __gnu_cxx::__atomic_add_dispatch(&this->_M_refcount, 1);
     return _M_refdata();
   }  // XXX MT

So there is definitely something there about atomic changes to the reference counter...

Conclusion

I'm marking sellibitze's comment as answer here because I think we've reached the conclusion that this area is still unresolved for now. To circumvent the COW behaviour I'd suggest Jack Lloyd's answer. Thank you everybody for an interesting discussion!

+2  A: 

No STL container is thread safe. This way, the library has a general purpose (both to be used in single threading mode, or multi threading mode). In multithreading, you'll need to add the synchronization mechanism.

Cătălin Pitiș
The difference with string though is that even passing it by value induces threading concerns - this can be somewhat unexpected, to say the least.
Jack Lloyd
Thanks guys for such quick answers! I know no STL container is threadsafe (and properly use locking wrappers when I need them to be threadsafe) but as far as I know the std::string is the only one to "secretly" use the same datastore for identical strings. My concern is that the copy-on-write bookkeeping is not thread safe at all, am I correct there?
Benjamin
I don't understand why this gets so many upvotes. Guys, please pay attention to the actual question
sellibitze
+6  A: 

If you don't mind disabling copy-on-write, this may be the best course of action. std::string's COW only works if it knows that it is copying another std::string, so you can cause it to always allocate a new block of memory and make an actual copy. For instance this code:

#include <string>
#include <cstdio>

int main()
   {
   std::string orig = "I'm the original!";
   std::string copy_cow = orig;
   std::string copy_mem = orig.c_str();
   std::printf("%p %p %p\n", orig.data(),
                             copy_cow.data(),
                             copy_mem.data());
   }

will show that the second copy (using c_str) prevents COW. (Because the std::string only sees a bare const char*, and has no idea where it came from or what its lifetime might be, so it has to make a new private copy).

Jack Lloyd
Minor caveat: assigning to the result of c_str() will truncate the string if orig contains embedded nulls. It would be safer to use the assign method (or the constructor) taking a const char* and a size_type, and to pass "orig.data()" and "orig.size()" to that.
Éric Malenfant
If I decide to go for the 'disable COW' route, I will definitely use this (and the added remark from Eric). Nicely done :)
Benjamin
@Eric Excellent point, I can't believe I didn't consider how that would interact with nulls. Thanks.
Jack Lloyd
Alternatively, you could use the initialization from iterators I think.
Matthieu M.
+3  A: 

Threads are not yet part of the standard. But I don't think that any vendor can get away without making std::string thread-safe, nowadays. Note: There are different definitions of "thread-safe" and mine might differ from yours. Of course, it makes little sense to protect a container like std::vector for concurrent access by default even when you don't need it. That would go against the "don't pay for things you don't use" spirit of C++. The user should always be responsible for synchronization if he/she wants to share objects among different threads. The issue here is whether a library component uses and shares some hidden data structures that might lead to data races even if "functions are applied on different objects" from a user's perspective.

The C++0x draft (N2960) contains the section "data race avoidance" which basically says that library components may access shared data that is hidden from the user if and only if it activly avoids possible data races. It sounds like a copy-on-write implementation of std::basic_string must be as safe w.r.t. multi-threading as another implementation where internal data is never shared among different string instances.

I'm not 100% sure about whether libstdc++ takes care of it already. I think it does. To be sure, check out the documentation

sellibitze
Thank you very much as well for that detailed answer. I check the page you linked but it remains a bit vague in my opinion, talking about containers in general (that you should provide adequate locking) and not so much about strings. :)The data race avoidance seems like at least an "excuse" to just go ahead and assume everything will be OK between threads since that squarely puts the responsibility on the people implementing the library (provided of course that I as a programmer pass the strings by value instead of by reference)...
Benjamin
A: 

It seems that this was fixed a while ago: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=5444 was (closed as a the same issue than http://gcc.gnu.org/bugzilla/show_bug.cgi?id=5432, which was fixed in 3.1).

See also http://gcc.gnu.org/bugzilla/show_bug.cgi?id=6227

Éric Malenfant
I think that bug refers to the iostreams code though, not the string code?
Benjamin
Oh, right. I referred to this one as the one pertaining to basic_string (#5444) was closed as having the same resolution than 5432. I edited my answer to clarify this.
Éric Malenfant
Thanks Eric! As long as we're adding bugtracker items, this one also seems related: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40518
Benjamin
I found another big discussion on this subject: http://etbe.coker.com.au/2009/06/22/valgrindhelgrind-and-stl-string/
Benjamin