views:

207

answers:

7

In the C++ Standard Library, std::string has a public member function capacity() which returns the size of the internal allocated storage, a value greater than or equal to the number of characters in the string (according to here). What can this value be used for? Does it have something to do with custom allocators?

+6  A: 
Viktor Sehr
+2  A: 

It could be used for some performance tuning if you are about to add a lot of characters to the string. Before starting the string manipulation, you can check for the capacity and if it is too small, reserve the desired length in a single step (instead of letting it reallocate successively bigger chunks of memory several times, which would be a performance hog).

Péter Török
I see, but why not just call reserve in the first place (and forget about the capacity), just to make sure? I mean, if you're going to insert the characters anyway does it make a difference whether you know the capacity beforehand?
dreamlax
@dreamlax Well, I believe that the implementation of `reserve()` *could* choose to allocate more memory, even if it didn't need to, and while `capacity()` is likely inlined, `reserve()` would be less likely to, since it would be bigger (so calling `capacity()` could be cheaper in the case where you wouldn't need `reserve()`). However, in reality, it's likely more a completeness thing. Someone might like to know the capacity and might have a really good reason to need it that the library designers couldn't foresee. While generally unnecessary, it doesn't hurt to have it, and it could be useful.
Jonathan M Davis
+10  A: 

You are more likely to use the reserve() member function, which sets the capacity to at least the supplied value.

The capacity() member function itself might be used to avoid allocating memory. For instance, you could recycle used strings through a pool, and put each one in a different size bucket based on its capacity. A client of the pool could then ask for a string that already has some minimum capacity.

Marcelo Cantos
I had never thought of reusing strings before. Is this kind of optimisation common? Perhaps on systems with very limited resources.
dreamlax
It's definitely an optimization. Memory allocations and deallocations can be expensive. So, in programs where it's found to be a bottleneck, programmers have been known to create memory pools rather than always allocating or deallocating memory. String operations being fairly expensive in general (particularly when it's quite common to create and throw away strings fairly quickly), they'd be a prime target. However, it's not the kind of thing that you'd do unless profiling showed that it would really benefit performance.
Jonathan M Davis
It certainly isn't common, since very few applications have such demanding performance needs. Also, it is more about avoiding malloc-thrashing than memory constraints, especially in heavily concurrent apps, which suffer a performance hit when they manipulate the heap too frequently.
Marcelo Cantos
The usual approach would not be having pools of strings, but rather having a custom allocator with memory pools that makes fast allocations and can be used with any and all types instead of only strings. In fact I would avoid considering a string pool an optimization at all...
David Rodríguez - dribeas
For a string pool to work, the strings would have to be on the heap. Most strings are in fact members of another structure, or on the stack. A dedicated allocator is a much better solution, as the string buffer often is on the heap (modulo SSO)
MSalters
Oh, and it wouldn't work anyway. std::string doesn't have to respect `capacity` on assignments, it may reallocate to shrink.
MSalters
Custom allocators are quite problematic (largely owing to the fact that they are part of the string's type) and are impractical to use in many situations. I avoid them like the plague.
Marcelo Cantos
You wouldn't assign a string back into a pool, only a pointer or smart pointer.
Marcelo Cantos
You beat Neil Butterworth by 2 minutes in describing a use for `capacity()` that seems valid (i.e. trying to avoid a costly reallocation).
dreamlax
+3  A: 

It gives you the number of characters the string could contain without having to re-allocate. I suppose this might be important in a situation where allocation was expensive, and you wanted to avoid it, but I must say this is one string member function I've never used in real code.

anon
+2  A: 

Strings have a capacity and a size. The capacity indicates how many characters that the string can hold before it will have to allocate more memory. The size indicates how many characters that it currently holds. reserve() can be used to set the minimum capacity of the string (it will allocate memory for at least that number of characters but could allocate more).

This is primarily of importance if you're increasing the size of the string. When you concatenate onto the string with += or append(), the characters from the given string will be added to the end of the current one. If increasing the string to that size does not exceed the capacity, then it's just going to use the capacity that it has. However, if the new size would exceed the current capacity, then the string will have to reallocate memory internally and copy its internals into the new memory. If you're going to be doing that a lot, it can get expensive (though it is done in amortized constant time), so in such a case, you could use reserve() to preallocate enough memory to reduce how often reallocations have to take place.

vector functions in basically the same way with the same functions.

Personally, while I've dealt with capacity() and reserve() with vector from time to time, I've never seen much need to do so with string - probably because I don't generally do enough string concatenations in my code for it to be worth it. In most cases, a particular string might get a few concatenations but not enough to worry about its capacity. Worrying about capacity is generally something you do when trying to optimize your code.

Jonathan M Davis
I understand the use of `reserve`, but why is it important to know the string's current capacity?
dreamlax
You might know about how much capacity that you're going to need and would like to check whether the string that you're about to add to is large enough. If you're doing multiple concatenations to it and you know about how many characters that will be, it would be more efficient to just call `reserve()` first and avoid multiple reallocations. But you wouldn't necessarily know whether you'd need to if you didn't check the capacity first. In most cases, however, you'd probably be dealing with a string that you just created and you'd just call `reserve()` and not bother with `capacity()`.
Jonathan M Davis
+1  A: 

There's hardly any relevant use. It is similar to std::vector::capacity. However, one of the most common uses of strings is assignment. When assigning to a std::string, its .capacity may change. This means that an implementation has the right to ignore the old capacity and allocate precisely enough memory.

MSalters
+1  A: 

It genuinely isn't very useful, and is probably there only for symmetry with vector (under the assumption that both will operate internally in the same way).

The capacity of a vector is guaranteed to affect the behaviour of a resize. Resizing a vector to a value less than or equal to the capacity will not induce a reallocation, and will not therefore invalidate iterators or pointers referring to elements in the vector. This means you can pre-allocate some storage by calling reserve on a vector, then (with care) add elements to it by resizing or pushing back (etc.), safe in the knowledge that the underlying buffer won't move.

There is no such guarantee for string, though. It seems that the capacity is for informational purposes only -- though even that's a stretch, as it doesn't look like there's any useful information to be taken from it anyway. (Worse yet, contiguity of string chars isn't guaranteed either, so the only way you can get at the string as a linear buffer is c_str() -- which may induce a reallocation.)

At a guess, string was presumably originally intended to be implemented as some kind of a special case of vector, but over time the two grew apart...

brone