In the C++ Standard Library, std::string
has a public member function capacity()
which returns the size of the internal allocated storage, a value greater than or equal to the number of characters in the string (according to here). What can this value be used for? Does it have something to do with custom allocators?
views:
207answers:
7It could be used for some performance tuning if you are about to add a lot of characters to the string. Before starting the string manipulation, you can check for the capacity and if it is too small, reserve the desired length in a single step (instead of letting it reallocate successively bigger chunks of memory several times, which would be a performance hog).
You are more likely to use the reserve()
member function, which sets the capacity to at least the supplied value.
The capacity()
member function itself might be used to avoid allocating memory. For instance, you could recycle used strings through a pool, and put each one in a different size bucket based on its capacity. A client of the pool could then ask for a string that already has some minimum capacity.
It gives you the number of characters the string could contain without having to re-allocate. I suppose this might be important in a situation where allocation was expensive, and you wanted to avoid it, but I must say this is one string member function I've never used in real code.
Strings have a capacity and a size. The capacity indicates how many characters that the string can hold before it will have to allocate more memory. The size indicates how many characters that it currently holds. reserve()
can be used to set the minimum capacity of the string (it will allocate memory for at least that number of characters but could allocate more).
This is primarily of importance if you're increasing the size of the string. When you concatenate onto the string with +=
or append()
, the characters from the given string will be added to the end of the current one. If increasing the string to that size does not exceed the capacity, then it's just going to use the capacity that it has. However, if the new size would exceed the current capacity, then the string will have to reallocate memory internally and copy its internals into the new memory. If you're going to be doing that a lot, it can get expensive (though it is done in amortized constant time), so in such a case, you could use reserve()
to preallocate enough memory to reduce how often reallocations have to take place.
vector functions in basically the same way with the same functions.
Personally, while I've dealt with capacity()
and reserve()
with vector from time to time, I've never seen much need to do so with string - probably because I don't generally do enough string concatenations in my code for it to be worth it. In most cases, a particular string might get a few concatenations but not enough to worry about its capacity. Worrying about capacity is generally something you do when trying to optimize your code.
There's hardly any relevant use. It is similar to std::vector::capacity. However, one of the most common uses of strings is assignment. When assigning to a std::string, its .capacity may change. This means that an implementation has the right to ignore the old capacity and allocate precisely enough memory.
It genuinely isn't very useful, and is probably there only for symmetry with vector
(under the assumption that both will operate internally in the same way).
The capacity of a vector is guaranteed to affect the behaviour of a resize. Resizing a vector to a value less than or equal to the capacity will not induce a reallocation, and will not therefore invalidate iterators or pointers referring to elements in the vector. This means you can pre-allocate some storage by calling reserve on a vector, then (with care) add elements to it by resizing or pushing back (etc.), safe in the knowledge that the underlying buffer won't move.
There is no such guarantee for string
, though. It seems that the capacity is for informational purposes only -- though even that's a stretch, as it doesn't look like there's any useful information to be taken from it anyway. (Worse yet, contiguity of string chars isn't guaranteed either, so the only way you can get at the string as a linear buffer is c_str()
-- which may induce a reallocation.)
At a guess, string
was presumably originally intended to be implemented as some kind of a special case of vector
, but over time the two grew apart...