Given that:
1) The C++03 standard does not address the existence of threads in any way
2) The C++03 standard leaves it up to implementations to decide whether std::string
should use Copy-on-Write semantics in its copy-constructor
3) Copy-on-Write semantics often lead to unpredictable behavior in a multi-threaded program
I come to the following, seemingly controversial, conclusion:
You simply cannot safely and portably use std::string in a multi-threaded program
Obviously, no STL data structure is thread-safe. But at least, with std::vector for example, you can simply use mutexes to protect access to the vector. With an std::string implementation that uses COW, you can't even reliably do that without editing the reference counting semantics deep within the vendor implementation.
Real-world example:
In my company, we have a multi-threaded application which has been thoroughly unit-tested and run through Valgrind countless times. The application ran for months with no problems whatsoever. One day, I recompile the application on another version of gcc, and all of a sudden I get random segfaults all the time. Valgrind is now reporting invalid memory accesses deep within libstdc++, in the std::string copy constructor.
So what is the solution? Well, of course, I could typedef std::vector<char>
as a string class - but really, that sucks. I could also wait for C++0x, which I pray will require implementors to forgo COW. Or, (shudder), I could use a custom string class. I personally always rail against developers who implement their own classes when a preexisting library will do fine, but honestly, I need a string class which I can be sure is not using COW semantics; and std::string simply doesn't guarantee that.
So, what say you, stackoverflow? Am I right that std::string
simply cannot be used reliably at all in portable, multi-threaded programs? And what is a good workaround?