tags:

views:

1089

answers:

8

Given that:

1) The C++03 standard does not address the existence of threads in any way

2) The C++03 standard leaves it up to implementations to decide whether std::string should use Copy-on-Write semantics in its copy-constructor

3) Copy-on-Write semantics often lead to unpredictable behavior in a multi-threaded program

I come to the following, seemingly controversial, conclusion:

You simply cannot safely and portably use std::string in a multi-threaded program

Obviously, no STL data structure is thread-safe. But at least, with std::vector for example, you can simply use mutexes to protect access to the vector. With an std::string implementation that uses COW, you can't even reliably do that without editing the reference counting semantics deep within the vendor implementation.

Real-world example:

In my company, we have a multi-threaded application which has been thoroughly unit-tested and run through Valgrind countless times. The application ran for months with no problems whatsoever. One day, I recompile the application on another version of gcc, and all of a sudden I get random segfaults all the time. Valgrind is now reporting invalid memory accesses deep within libstdc++, in the std::string copy constructor.

So what is the solution? Well, of course, I could typedef std::vector<char> as a string class - but really, that sucks. I could also wait for C++0x, which I pray will require implementors to forgo COW. Or, (shudder), I could use a custom string class. I personally always rail against developers who implement their own classes when a preexisting library will do fine, but honestly, I need a string class which I can be sure is not using COW semantics; and std::string simply doesn't guarantee that.

So, what say you, stackoverflow? Am I right that std::string simply cannot be used reliably at all in portable, multi-threaded programs? And what is a good workaround?

A: 

I regulate the string access:

  • make std::string members private
  • return const std::string& for getters
  • setters modify the member

This has always worked fine for me and is correct data hiding.

jdehaan
Does returning a const reference guarantee correct behaviour? I previously thought so, but I got caught out by some nasty concurrency problems recently, eg: const std::string return m_MyString; }in this instance, doesn't the mutex get unlocked *before* the return value's copy constructor? In which case you're liable to get race conditions.
the_mandrill
apologies about the formatting...
the_mandrill
@the_mandrill: "doesn't the mutex get unlocked before the return value's copy constructor" Of course it gets unlocked before that. All you protect is the creation of a reference. Return by copy in this scenario.
sbi
This doesn't address the issue. It has nothing to do with encapsulation; the problem is much deeper. The fact is, a multi-threaded program can't *ever* modify an `std::string` that has been copied, and expect to work properly with all standard-compliant compilers.
Charles Salvia
+2  A: 

Given that the standard doesn't say a word about memory models and is completely thread unaware, I'd say you can't definitely assume every implementation will be non-cow so no, you can't

Apart from that, if you know your tools, most of the implementations will use non-cow strings to allow multi-threading.

Arkaitz Jimenez
"most of the implementations will use non-cow strings to allow multi-threading." Not quite true, in fact **most** C++ compilers implement COW strings: gcc, intel, HP... MSVC does not. Most of implementations (with exception of MSVC6) are thread safe because are using atomic counters.
Artyom
+6  A: 

You are right. This will be fixed in C++0x. For now you have to rely on your implementation's documentation. For example, recent libstdc++ Versions (GCC) lets you use string objects as if no string object shares its buffer with another one. C++0x forces a library implemetation to protect the user from "hidden sharing".

sellibitze
A: 

If you want to disable COW semantics, you could force your strings to make copies:

// instead of:
string newString = oldString;

// do this:
string newString = oldString.c_str();
Bill
It seems easier and better (especially for code quality) just to use another implementation.
sbi
+4  A: 

You cannot safely and portably do anything in a multi-threaded program. There is no such thing as a portable multi-threaded C++ program, precisely because threads throw everything C++ says about order of operations, and the results of modifying any variable, out the window.

There's also nothing in the standard to guarantee that vector can be used in the way you say. It would be legal to provide a C++ implementation with a threading extension in which, say, any use of a vector outside the thread in which it was initialized results in undefined behavior. The instant you start a second thread, you aren't using standard C++ any more, and you must look to your compiler vendor for what is safe and what is not.

If your vendor provides a threading extension, and also provides a std::string with COW that (therefore) cannot be made thread-safe, then I think for the time being your argument is with your vendor, or with the threading extension, not with the C++ standard. For example, arguably POSIX should have barred COW strings in programs which use pthreads.

You could possibly make it safe by having a single mutex, which you take while doing any string mutation whatsoever, and any reads of a string that's the result of a copy. But you'd probably get crippling contention on that mutex.

Steve Jessop
+2  A: 

You can use STLport. It provides non-COW strings. And it has the same behavior on different platforms.

This article presents comparison of STL strings with copy-on-write and noncopy- on-write argorithms, based on STLport strings, ropes and GNU libstdc++ implementations.

In a company where I work I have some experience running the same server application built with STLport and without STLport on HP-UX 11.31. The application was compiled with gcc 4.3.1 with optimization level O2. So when I run the progrma built with STLport it processes requests 25% faster comparing to the the same program built without STLport (which uses gcc own STL library).

I profiled both versions and found out that the version without STLport spends much more time in pthread_mutex_unlock() (2.5%) comparing to the version with STLport (1%). And pthread_mutex_unlock() itself in the version without STLport is called from one of std::string functions.

However, when after profiling I changed assignments to strings in most often called functions in this way:

string_var = string_var.c_str(); // added .c_str()

there was significant improvement in performance of the version without STLport.

skwllsp
A: 

In MSVC, std::string is no longer reference counted shared pointer to a container. They choose to the entire contents by-value in every copy constructor and assignment operator, to avoid multithreading problems.

Pavel Radzivilovsky
+3  A: 

A more correct way to look at it would be "You cannot safely and portably use C++ in a multithreaded environment". There is no guarantee that other data structures will behave sensibly either. Or that the runtime won't blow up your computer. The standard doesn't guarantee anything about threads.

So to do anything with threads in C++, you have to rely on implementation-defined guarantees. And Then you can safely use std::string because each implementation tells you whether or not it is safe to use in a threaded environment.

You lost all hope of true portability the moment you spawned a second thread. std::string isn't "less portable" than the rest of the language/library.

jalf
You're right, of course. If we're talking about what's guaranteed by the standard, no multi-threaded C++ program is portable. However, the *de facto* state of things is that you can write multi-threaded code that will behave consistently across most modern platforms using things like pthreads or boost threads - with the notable exception of code that uses `std::string`, due to the non-negligible possibility of COW semantics. Or to put it another way: multi-threaded code that uses `std::vector` is not likely to break *anywhere* these days; but the same cannot be said for `std::string`.
Charles Salvia