views:

486

answers:

9
+7  Q: 

strings in C++

hi, I have following questions regarding strings in C++

1>> which is a better option(considering performance) and why?

1.

string a;
a = "hello!";

OR

2.

string *a;
a = new string("hello!");
...
delete(a);

2>>

string a;
a = "less"; 
a = "moreeeeeee";

how exactly memory management is handled in c++ when a bigger string is copied into a smaller string? Are c++ strings mutable?

+2  A: 
string a;
a = "hello!";

2 operations: calls the default constructor std:string() and then calls the operator::=

string *a; a = new string("hello!"); ... delete(a);

only one operation: calls the constructor std:string(const char*) but you should not forget to release your pointer.

What about string a("hello");

Pierre
Are you sure about all that? Did you look at the code produced by the compiler? Is the empty ctor really called?
Tim
Of course the compiler can optimize it away, if it is clever enough to determine that it has no side effects, and so the declaration and initialization can be merged back together. But that's not a given.
jalf
Is the pointer version really only one operation? Does the indirection not cause overhead?
yungchin
I guess it depends on what you call an "operation", but in terms of heap allocation, the pointer version is terrible.:- A block is allocated for the string object- string's ctor is called, which allocates a block for the content.And then "delete a" does the reverse (i.e., frees the 2 blocks)
Éric Malenfant
Agree with Éric, it is much better to use: std::string a( "hello" );, which is equivalent to std::string a = "hello"; both of them call the const char* constructor.
David Rodríguez - dribeas
+14  A: 

It is almost never necessary or desirable to say

string * s = new string("hello");

After all, you would (almost) never say:

int * i = new int(42);

You should instead say

string s( "hello" );

or

string s = "hello";

And yes, C++ strings are mutable.

anon
+4  A: 

Is there a specific reason why you constantly use assignment instead of intialization? That is, why don't you write

string a = "Hello";

etc.? This avoids a default construction and just makes more sense semantically. Creating a pointer to a string just for the sake of allocating it on the heap is never meaningful, i.e. your case 2 doesn't make sense and is slightly less efficient.

As to your last question, yes, strings in C++ are mutable unless declared const.

Konrad Rudolph
A: 

In case 1.1, your string members (which include pointer to the data) are held in stack and the memory occupied by the class instance is freed when a goes out of scope.

In case 1.2, memory for the members is allocated dynamically from heap too.

When you assign a char* constant to a string, memory that will contain the data will be realloc'ed to fit the new data.

You may see how much memory is allocated by calling string::capacity().

When you call string a("hello"), memory gets allocated in the constructor.

Both constructor and assignment operator call same methods internally to allocated memory and copy new data there.

Quassnoi
A: 

If you look at the docs for the STL string class (I believe the SGI docs are compliant to the spec), many of the methods list complexity guarantees. I believe many of the complexity guarantees are intentionally left vague to allow different implementations. I think some implementations actually use a copy-on-modify approach such that assigning one string to another is a constant-time operation, but you may incur an unexpected cost when you try to modify one of those instances. Not sure if that's still true in modern STL though.

You should also check out the capacity() function, which will tell you the maximum length string you can put into a given string instance before it will be forced to reallocate memory. You can also use reserve() to cause a reallocation to a specific amount if you know you're going to be storing a large string in the variable at a later time.

As others have said, as far as your examples go, you should really favor initialization over other approaches to avoid the creation of temporary objects.

rmeador
copy-on-write is generally no longer used, because it becomes inefficient in a multithreaded environment.
jalf
+7  A: 

All the following is what a naive compiler would do. Of course as long as it doesn't change the behavior of the program, the compiler is free to make any optimization.

string a;
a = "hello!";

First you initialize a to contain the empty string. (set length to 0, and one or two other operations). Then you assign a new value, overwriting the length value that was already set. It may also have to perform a check to see how big the current buffer is, and whether or not more memory should be allocated.

string *a;
a = new string("hello!");
...
delete(a);

Calling new requires the OS and the memory allocator to find a free chunk of memory. That's slow. Then you initialize it immediately, so you don't assign anything twice or require the buffer to be resized, like you do in the first version. Then something bad happens, and you forget to call delete, and you have a memory leak, in addition to a string that is extremely slow to allocate. So this is bad.

string a;
a = "less"; 
a = "moreeeeeee";

Like in the first case, you first initialize a to contain the empty string. Then you assign a new string, and then another. Each of these may require a call to new to allocate more memory. Each line also requires length, and possibly other internal variables to be assigned.

Normally, you'd allocate it like this:

string a = "hello";

One line, perform initialization once, rather than first default-initializing, and then assigning the value you want.

It also minimizes errors, because you don't have a nonsensical empty string anywhere in your program. If the string exists, it contains the value you want.

About memory management, google RAII. In short, string calls new/delete internally to resize its buffer. That means you never need to allocate a string with new. The string object has a fixed size, and is designed to be allocated on the stack, so that the destructor is automatically called when it goes out of scope. The destructor then guarantees that any allocated memory is freed. That way, you don't have to use new/delete in your user code, which means you won't leak memory.

jalf
string a = "hello";Can also be written using the explicit constructor string a("hello");Or in C++0x: string a = { "hello" };
Dean Michael
Yep, it wouldn't be C++ if there wasn't some ambiguity with at least 3 ways to do the same thing, would it? ;)But they all have the same effect. The string is created and initialized in the constructor, rather than first calling the constructor, and then assigning afterwards.
jalf
A: 

Creating a string directly in the heap is usually not a good idea, just like creating base types. It's not worth it since the object can easily stay on the stack and it has all the copy constructors and assignment operator needed for an efficient copy.

The std:string itself has a buffer in heap that may be shared by several string depending on the implementation.

For intsance, with Microsoft's STL implementation you could do that:

string a = "Hello!";
string b = a;

And both string would share the same buffer until you changed it:

a = "Something else!";

That's why it was very bad to store the c_str() for latter use; c_str() garentee only validity until another call to that string object is made.

This lead to very nasty concurrency bugs that required this sharing functionality to be turned off with a define if you used them in a multithreaded application

Coincoin
A: 

Most likely

   string a("hello!");

is faster than anything else.

Mihai Nita
A: 

You're coming from Java, right? In C++, objects are treated the same (in most ways) as the basic value types. Objects can live on the stack or in static storage, and be passed by value. When you declare a string in a function, that allocates on the stack however many bytes the string object takes. The string object itself does use dynamic memory to store the actual characters, but that's transparent to you. The other thing to remember is that when the function exits and the string you declared is no longer in scope, all of the memory it used is freed. No need for garbage collection (RAII is your best friend).

In your example:

string a;
a = "less"; 
a = "moreeeeeee";

This puts a block of memory on the stack and names it a, then the constructor is called and a is initialized to an empty string. The compiler stores the bytes for "less" and "moreeeeeee" in (I think) the .rdata section of your exe. String a will have a few fields, like a length field and a char* (I'm simplifying greatly). When you assign "less" to a, the operator=() method is called. It dynamically allocates memory to store the input value, then copies it in. When you later assign "moreeeeeee" to a, the operator=() method is again called and it reallocates enough memory to hold the new value if necessary, then copies it in to the internal buffer.

When string a's scope exits, the string destructor is called and the memory that was dynamically allocated to hold the actual characters is freed. Then the stack pointer is decremented and the memory that held a is no longer "on" the stack.

Rob K