tags:

views:

288

answers:

5

What is the overhead in the string structure that causes sizeof() to be 32 ?

+9  A: 

Some std::string implementations save very small strings directly on the stack in a statically sized char array instead of using dynamic heap storage. This allows to avoid heap allocations for lots of small string objects and improves locality of reference.

Furthermore, there will be a std::size_t member to save the strings size and a (potentially unused, see above) pointer to the heap storage.

Konrad Rudolph
Ah. And why was this downvoted?
Konrad Rudolph
It seems to be me, but I didn't do it intentionally. Sorry about that!
Bill
@Bill: no sweat! Already happened to me, too.
Konrad Rudolph
+5  A: 

std::string typically contains a buffer for the "small string optimization" --- if the string is less than the buffer size then no heap allocation is required.

Anthony Williams
Where "typically" == "on Windows" ;-)
Steve Jessop
Windows compilers aren't the only ones that do the small-string optimization
Anthony Williams
Sure, but if you're not willing to name them then it's hard to judge whether this is "typical" behaviour, or just called that on the grounds that it's the behaviour of a common implementation (and presumably others).
Steve Jessop
From what I understand, Dinkumware and STLPort both do, but gcc's implementation doesn't.
Dennis Zickefoose
Btw, I mention it because "typically" spans a range from "I'm reasonably confident you'll never see anything else", to "50% or more of the implementations I've used do this". It's very easily misunderstood, I think. Neither this optimization, nor the absence of it, should be considered unusual.
Steve Jessop
+2  A: 

It is library dependent. You shouldn't rely on the size of std::string objects because it is likely to change in different environments (obviously between different standard library vendors, but also between different versions of the same library).

Keep in mind that std::string implementations are written by people who have optimized for a variety of use cases, typically leading to 2 internal representations, one for short strings (small internal buffer) and one for long strings (heap-allocated external buffer). The overhead is associated to holding both of these inside each std::string object.

André Caron
A: 

My guess is:

class vector
{
    char type;
    struct Heap
    {
      char*   start;
      char*   end;
      char*   allocatedEnd;
    };
    struct Stack
    {
      char    size;
      char    data[27];
    }
    union
    {
        Stack   stackVersion;
        Heap    heapVersion;
    } version;
};

But I bet there are hundreds of ways of doing it.

Martin York
A: 

Q: Why is a dog yellow? A: It's not necessarily.

The size of a (an?) std::string object is implementation-dependent. I just checked MS VC++ 2010. It does indeed use 32 bytes for std::string. There is a 16 byte union that contains either the text of the string, if it will fit, or a pointer to heap storage for longer strings. If the implementers had chosen to keep 18 byte strings in the string object rather than on the heap, the size would be 34 bytes. The other 16 bytes comprise overhead, containing such things as the length of the string and the amount of memory currently allocated for the string.

A different implementation might always allocate memory from the heap. Such an implementation would undoubtedly require less memory for the string object.

Jive Dadson