views:

109

answers:

4

I'm pretty new to C++, but I know you can't just use memory willy nilly like the std::string class seems to let you do. For instance:

std::string f = "asdf";
f += "fdsa";

How does the string class handle getting larger and smaller? I assume it allocates a default amount of memory and if it needs more, it news a larger block of memory and copies itself over to that. But wouldn't that be pretty inefficient to have to copy the whole string every time it needed to resize? I can't really think of another way it could be done (but obviously somebody did).

And for that matter, how do all the stdlib classes like vector, queue, stack, etc handle growing and shrinking so transparently?

+4  A: 

Usually, there's a doubling algorithm. In other words, when it fills the current buffer, it allocates a new buffer that's twice as big, and then copies the current data over. This results in fewer allocate/copy operations than the alternative of growing by a single allocation block.

Steven Sudit
So it does have to copy itself over on every memory allocation?
Thomas T.
Absolutely. There's no way to avoid this, although you could certainly minimize it by increasing the initial capacity.
Steven Sudit
Ok, thanks very much. My class isn't so antiquated as I thought :)
Thomas T.
@Thomas T: Yes, but that turns out to not be a major problem. C#, for example, copies the string after *every operation* on the string, rather than only when more storage is required; yet C# is still quite an efficient language.
Billy ONeal
@Billy, it's also drilled into C# programmers' heads that they should use StringBuilder instead of String a lot of the time.
Rob Kennedy
It need not be doubling, every other factor is also fine. Microsoft seems to prefer 1.5 for example, at least in their `std::vector` implementation.
FredOverflow
@FredOverflow - Not *every* other factor is fine. 1.0000001, for example, would hardly be effective :p. But yes, factors such as 1.5 also provide reasonable efficiency. It comes down to trying to mitigate both memory overhead by keeping the load factor down, while also trying to avoid too many copies as they take time. It's a balancing act, like a lot of computing.
Stephen
@Rob Kennedy: This is true. Just saying that it's not unreasonable efficiency wise to copy the string as a result of increasing it's allocation.
Billy ONeal
@Billy: It *can* be. For example, in the days of ASP/VB, concatenating `BSTR`'s was a huge source of slowness, precisely because it required reallocating and copying each time. `StringBuilder` in .NET fixes this, but only if you know to use it. As it happens, `StringBuilder` uses a doubling algorithm, too.
Steven Sudit
@Stephen: You misunderstood me. Note the words *increasing the allocation* -- I'm not talking about immutable strings. Oh, and StringBuilder does not use a doubling algorithm; internally it looks like a rope. (At least according to Reflector examined source code)
Billy ONeal
@Billy: Wrong Steven. :p
Stephen
@Billy: It's "Steven". Not that I'm particularly sensitive about it, but if you spell it wrong then SO won't notify me. In any case, what you said about `StringBuilder` is new to .NET 4.0. In 3.5, it's just a string, although it uses internal methods to make it effectively mutable.
Steven Sudit
@Steven: Sorry bout the renaming issue. :P
Billy ONeal
Actually there was a discussion a while ago on c++.comp.lang.moderated regarding the growing factor. Measures showed that using a factor inferior to 2 was better, because allocators usually ended up allocating power of 2 bytes block, and thus it was better to grow slower than that. I remember that Alexandrescu was involved, I guess those interested could look it up.
Matthieu M.
@Matt: What I've seen are attempts to match up the allocation with the granularity of the allocation blocks, so that the number of characters plus any overhead is just under the limit.
Steven Sudit
+2  A: 

Although I do not know the exact implementation of std::string, most data structures that need to handle dynamic memory growth do so by doing exactly what you say - allocate a default amount of memory, and if more is needed then create a bigger block and copy yourself over.

The way you get around the obvious inefficiency problem is to allocate more memory than you need. The ratio of used memory:total memory of a vector/string/list/etc is often called the load factor (also used for hash tables in a slightly different meaning). Usually it's a 1:2 ratio - that is, you assign twice the memory you need. When you run out of space, you assign a new amount of memory twice your current amount and use that. This means that over time, if you continue to add things to the vector/string/etc, you need to copy over the item less and less (as the memory creation is exponential, and your inserting of new items is of course linear), and so the time taken for this method of memory handling is not as large as you might think. By the principles of Amortized Analysis, you can then see that inserting m items into a vector/string/list using this method is only Big-Theta of m, not m2.

Stephen
I can't agree with the note, but the rest is quite good.
Steven Sudit
@Steven - In which case I am torn. Above there are people claiming that C# copies a string after every movement, and it isn't generally seen as slow... But my theoretical mind of course goes "Argh! n^2! n^2! Argh!"... XD. I'll remove the note, I think.
Stephen
@Stephen: C# `string` (or, rather, .NET `System.String`) is *not* directly comparable to `std::string`. In .NET 3.5, the closest thing was a `StringBuilder`, which was a mutable collection of characters that allowed in-place modification of a contiguous buffer. It also allowed creation of immutable snapshots using `ToString()`. In .NET 4.0, there's a class of the same name and general abilities, but its implementation is segmented, hence it has less in common with `std::string`. The key point is that, regardless, it's not seen as acceptable to simply append to a `string` repeatedly.
Steven Sudit
+2  A: 

Your analysis is correct — it is inefficient to copy the string every time it needs to resize. That's why common advice discourages that use pattern. Use the string's reserve function to ask it to allocate enough memory for what you intend to store in it. Then further operations will fill that memory. (But if your hint turns out to be too small, the string will still grow automatically, too.)

Containers will also usually try to mitigate the effects of frequent re-allocation by allocating more memory than they need. A common algorithm is that when a string finds that it's out of space, it doubles its buffer size instead of just allocating the minimum required to hold the new value. If the string is being grown one character at a time, this doubling algorithm reduces the time complexity to amortized linear time (instead of quadratic time). It also reduces the program's susceptibility to memory fragmentation.

Rob Kennedy
A: 

And remember that std:: vector, queue, stack really only store a pointer to the object in the vector, the real data is stored separately so a resize and copy is only copying the pointers.

Martin Beckett
Granted, a `std::vector<std::string>` can be resized without moving the buffers with the characters in them. Generally speaking, though, a vector stores the object itself, not a pointer. It's just that, sometimes, the object itself has a pointer to a larger memory area.
Steven Sudit