What is the overhead in the string structure that causes sizeof() to be 32 ?
Some std::string
implementations save very small strings directly on the stack in a statically sized char
array instead of using dynamic heap storage. This allows to avoid heap allocations for lots of small string objects and improves locality of reference.
Furthermore, there will be a std::size_t
member to save the strings size and a (potentially unused, see above) pointer to the heap storage.
std::string
typically contains a buffer for the "small string optimization" --- if the string is less than the buffer size then no heap allocation is required.
It is library dependent. You shouldn't rely on the size of std::string
objects because it is likely to change in different environments (obviously between different standard library vendors, but also between different versions of the same library).
Keep in mind that std::string
implementations are written by people who have optimized for a variety of use cases, typically leading to 2 internal representations, one for short strings (small internal buffer) and one for long strings (heap-allocated external buffer). The overhead is associated to holding both of these inside each std::string
object.
My guess is:
class vector
{
char type;
struct Heap
{
char* start;
char* end;
char* allocatedEnd;
};
struct Stack
{
char size;
char data[27];
}
union
{
Stack stackVersion;
Heap heapVersion;
} version;
};
But I bet there are hundreds of ways of doing it.
Q: Why is a dog yellow? A: It's not necessarily.
The size of a (an?) std::string object is implementation-dependent. I just checked MS VC++ 2010. It does indeed use 32 bytes for std::string. There is a 16 byte union that contains either the text of the string, if it will fit, or a pointer to heap storage for longer strings. If the implementers had chosen to keep 18 byte strings in the string object rather than on the heap, the size would be 34 bytes. The other 16 bytes comprise overhead, containing such things as the length of the string and the amount of memory currently allocated for the string.
A different implementation might always allocate memory from the heap. Such an implementation would undoubtedly require less memory for the string object.