A rather quick (and therefore probably incomplete) shot at answering some of the questions:
- Why is it coded as a template?
Templates provide the capability for the class functions to work on arbitrary data types. For example the basic_string<>
template class can work on char
units (which is what the std::string
typedef does) or wchar_t
units (std::wstring
) or any POD type. Using something other than char
or wchar_t
is unusual (std::vector<>
would more likely be used), but the possibility exists.
- How does it do mem allocation?
This isn't specified by the standard. In fact, the basic_string<>
template allows an arbitrary allocator to be used for the actual allocation of memory (but doesn't determine at what points allocations might be requested). Some implementations might store short strings in actual class members, and only allocate dynamically when the strings grow beyond a certain size. The size requested might be exactly what's need to store the string or might be a multiple of the size to allow for growth without a reallocation.
Additional information stolen from another SO answer:
Scott Meyer's book, Effective STL, has a chapter on std::string implementations that's a decent overview of the common variations: "Item 15: Be aware of variations in string implementations".
He talks about 4 variations:
several variations on a ref-counted implementation (commonly known as copy on write) - when a string object is copied unchanged, the refcount is incremented but the actual string data is not. Both object point to the same refcounted data until one of the objects modifies it, causing a 'copy on write' of the data. The variations are in where things like the refcount, locks etc are stored.
a "short string optimization" implementation. In this variant, the object contains the usual pointer to data, length, size of the dynamically allocated buffer, etc. But if the string is short enough, it will use that area to hold the string instead of dynamically allocating a buffer
- Why is (is not) better than mere null terminated char arrays?
One way the string
class is better than a mere null terminated array is that the class manages the memory required, so defects involving allocation errors or overrunning the end of the allocated arrays are reduced. Another (perhaps minor) benefit is that you can store 'null' characters in the string. A drawback is that there's perhaps some overhead - especially that you pretty much have to rely on dynamic memory allocation for the string class. In most scenarios that's probably not a major issue, on some setups (embedded systems for example) it can be a problem.