views:

325

answers:

3

When allocating an empty BSTR, either by SysAllocString(L"") or by SysAllocStringLen(str, 0) you always get a new BSTR (at least by the test I made). BSTRs aren't typically shared (like Java/.NET interment) since they are mutable but an empty string is, for all intents and purposes, immutable.

My question (at long last) is why doesn't COM use the trivial optimization of always returning the same string when creating an empty BSTR (and ignoring it in SysFreeString)? Is there a compelling reason not to do so (because my reasoning is flawed) or is it just that it wasn't thought to be important enough?

+3  A: 

I can't speak about what's idiomatic in COM, but in C++, Java, etc., there's an expectation that if you new an object, it will not compare equal (as far as address/object identity is concerned) to any other object. This property is useful when you use identity-based mapping, e.g., as keys in an IdentityHashMap in Java. For that reason, I don't think empty strings/objects should be an exception to this rule.

Well-written COM objects will allow you to pass NULL to a BSTR parameter and treat it as equivalent to an empty string. (This will not work with MSXML though, as I've learnt the hard way. :-P)

Chris Jester-Young
String interning applies to: 1. string constants, and 2. strings where intern() has been called. But where you use `new`, the object inequality will still apply: `String a = new String("foo"), b = new String("foo"); assert a != b; assert a.equals(b);`
Chris Jester-Young
Also: `assert a.intern() == "foo"; assert b.intern() == "foo";`
Chris Jester-Young
Ah, ok. Learned something new. Sorry, then :-)
Joey
+2  A: 

I'd guess (and yes, it's only a guess) that this optimization wasn't deemed important enough to perform.

While for many things from Windows's past memory consumption was a major factor in the API design (cf. Raymond Chen's articles), unlike Java's or .NET's string interning the benefits are rather small since they only apply to a single string which is only six bytes long. And how many empty strings a program has to keep in memory at any single point in time? Is that number large enough to warrant that optimization or is it actually negligible?

Joey
+2  A: 

It isn't COM allocating the BSTR so much as it is the windows subsystem providing it.

Empty BSTRs cannot share a static instance because there are functions that can reallocate/resize BSTRs. See SysReAllocString. Although no optimistic allocation behavior is mentioned, it cannot be assumed that the caller will never receive the original BSTR after a call.

SysReAllocString @ MSDN

edit:

Upon some reflection, I realize that even in accounting for SysReAllocString, one could start with an empty BSTR that is shared, call SysReAllocString, and receive a new BSTR without any breaking behavior. So that can be discounted for the sake of argument. My fault.

However, I figure that the idea of an empty BSTR carries more baggage than one might think. I wrote some test programs to see if I could get some conflicting or interesting results. After running my tests and tallying up the results, I think the best answer to your question is that it is simply safest for everyone involved if all requests get their own BSTRs. There are lots of funky ways to get BSTRs that report different flavors of zero-lengths, both string and byte-oriented. Even if there were some optimization that returned shared instances in some places, there's plenty of room for confusion when verbally describing an empty BSTR versus an actual BSTR that has empty string length and real allocation length. For example, a statement such as "a BSTR that has no string-allocated length may be forgotten", could be apt to lead to some aggravating memory leaks (see tests below regarding byte-allocated BSTRs).

Also, despite some COM components that allow NULL-pointer (0-valued) BSTRs as arguments, it is unsafe to assume that all COM components support it. This can only be safe if both the caller and the callee agree to allow this. The safest behavior for everyone is to assume that if a BSTR is handed over, that it may have zero-definition length, require handling the case of zero-definition length, and to require some value that isn't a NULL-pointer. At the very least, this makes it much easier to write proxy/stub code and other tricky tasks.

My first test program attempted some uncommon allocation methods. Note that you can get BSTRs with reported SysStringLen-lengths of 0, but with real byte allocations. Also, I admit forehand that bstr5 and bstr6 are not clean methods of allocation.

Here's the source:

int _tmain(int argc, _TCHAR* argv[])
{
  BSTR bstr1 = SysAllocString(L"");
  BSTR bstr2 = SysAllocStringLen(NULL, 0);
  BSTR bstr3 = SysAllocStringLen(NULL, 1);
  *bstr3 = '\0';
  BSTR bstr4 = SysAllocStringLen(L"some string", 0);
  BSTR bstr5 = SysAllocStringByteLen((LPCSTR)L"", 1);
  BSTR bstr6 = SysAllocStringByteLen((LPCSTR)L"", 2);
  BSTR bstr7 = SysAllocStringByteLen("", 1);
  BSTR bstr8 = SysAllocStringByteLen("\0\0", 2);
  BSTR bstr9 = SysAllocStringByteLen(NULL, 0);
  BSTR bstr10 = SysAllocStringByteLen(NULL, 1);

  _tprintf(_T("L\"\"-sourced BSTR\r\n")
    _T("\tBSTR=0x%8.8x, length %d, %d bytes\r\n"), 
    bstr1, SysStringLen(bstr1), SysStringByteLen(bstr1));
  _tprintf(_T("NULL BSTR with no alloc length\r\n")
    _T("\tBSTR=0x%8.8x, length %d, %d bytes\r\n"), 
    bstr2, SysStringLen(bstr2), SysStringByteLen(bstr2));
  _tprintf(_T("NULL BSTR with 1 OLECHAR alloc length\r\n")
    _T("\tBSTR=0x%8.8x, length %d, %d bytes\r\n"), 
    bstr3, SysStringLen(bstr3), SysStringByteLen(bstr3));
  _tprintf(_T("L\"some string\"-sourced BSTR with no alloc length\r\n")
    _T("\tBSTR=0x%8.8x, length %d, %d bytes\r\n"), 
    bstr4, SysStringLen(bstr4), SysStringByteLen(bstr4));
  _tprintf(_T("L\"\"-sourced BSTR with 1 byte alloc length\r\n")
    _T("\tBSTR=0x%8.8x, length %d, %d bytes\r\n"), 
    bstr5, SysStringLen(bstr5), SysStringByteLen(bstr5));
  _tprintf(_T("L\"\"-sourced BSTR with 2 byte alloc length\r\n")
    _T("\tBSTR=0x%8.8x, length %d, %d bytes\r\n"), 
    bstr6, SysStringLen(bstr6), SysStringByteLen(bstr6));
  _tprintf(_T("\"\"-sourced BSTR with 1 byte alloc length\r\n")
    _T("\tBSTR=0x%8.8x, length %d, %d bytes\r\n"), 
    bstr7, SysStringLen(bstr7), SysStringByteLen(bstr7));
  _tprintf(_T("\"\\0\\0\"-sourced BSTR with 2 byte alloc length\r\n")
    _T("\tBSTR=0x%8.8x, length %d, %d bytes\r\n"), 
    bstr8, SysStringLen(bstr8), SysStringByteLen(bstr8));
  _tprintf(_T("NULL-sourced BSTR with 0 byte alloc length\r\n")
    _T("\tBSTR=0x%8.8x, length %d, %d bytes\r\n"), 
    bstr9, SysStringLen(bstr9), SysStringByteLen(bstr9));
  _tprintf(_T("NULL-sourced BSTR with 1 byte alloc length\r\n")
    _T("\tBSTR=0x%8.8x, length %d, %d bytes\r\n"), 
    bstr10, SysStringLen(bstr10), SysStringByteLen(bstr10));

  SysFreeString(bstr1);
  SysFreeString(bstr2);
  SysFreeString(bstr3);
  SysFreeString(bstr4);
  SysFreeString(bstr5);
  SysFreeString(bstr6);
  SysFreeString(bstr7);
  SysFreeString(bstr8);
  SysFreeString(bstr9);
  SysFreeString(bstr10);

  return 0;
}

Here are the results I received.

L""-sourced BSTR
        BSTR=0x00175bdc, length 0, 0 bytes
NULL BSTR with no alloc length
        BSTR=0x00175c04, length 0, 0 bytes
NULL BSTR with 1 OLECHAR alloc length
        BSTR=0x00175c2c, length 1, 2 bytes
L"some string"-sourced BSTR with no alloc length
        BSTR=0x00175c54, length 0, 0 bytes
L""-sourced BSTR with 1 byte alloc length
        BSTR=0x00175c7c, length 0, 1 bytes
L""-sourced BSTR with 2 byte alloc length
        BSTR=0x00175ca4, length 1, 2 bytes
""-sourced BSTR with 1 byte alloc length
        BSTR=0x00175ccc, length 0, 1 bytes
"\0\0"-sourced BSTR with 2 byte alloc length
        BSTR=0x00175cf4, length 1, 2 bytes
NULL-sourced BSTR with 0 byte alloc length
        BSTR=0x00175d1c, length 0, 0 bytes
NULL-sourced BSTR with 1 byte alloc length
        BSTR=0x00175d44, length 0, 1 bytes

My next test program revealed that alterations downward in size may return the same BSTR. Here is a short snippet that can demonstrate this for you, along with the output I received. I also increased it beyond its original length as well, and still received the same BSTR back. This suggests, at the very least, that one cannot assume that a BSTR with no length cannot be increased in size.

int _tmain(int argc, _TCHAR* argv[])
{
  BSTR bstr1 = SysAllocString(L"hello world!");

  _tprintf(_T("L\"hello world!\"-sourced BSTR\r\n")
    _T("\tBSTR=0x%8.8x, length %d, %d bytes\r\n"), 
    bstr1, SysStringLen(bstr1), SysStringByteLen(bstr1));

  _tprintf(_T("resizing bstr1 to source L\"\"\r\n"));
  SysReAllocString(&bstr1, L"");
  _tprintf(_T("L\"\"-sourced reallocated BSTR\r\n")
    _T("\tBSTR=0x%8.8x, length %d, %d bytes\r\n"), 
    bstr1, SysStringLen(bstr1), SysStringByteLen(bstr1));

  _tprintf(_T("resizing bstr1 to source L\"hello!\"\r\n"));
  SysReAllocString(&bstr1, L"hello!");
  _tprintf(_T("L\"\"-sourced reallocated BSTR\r\n")
    _T("\tBSTR=0x%8.8x, length %d, %d bytes\r\n"), 
    bstr1, SysStringLen(bstr1), SysStringByteLen(bstr1));

  _tprintf(_T("resizing bstr1 to source L\"hello world!+\"\r\n"));
  SysReAllocString(&bstr1, L"hello world!+");
  _tprintf(_T("L\"\"-sourced reallocated BSTR\r\n")
    _T("\tBSTR=0x%8.8x, length %d, %d bytes\r\n"),
    bstr1, SysStringLen(bstr1), SysStringByteLen(bstr1));

  SysFreeString(bstr1);

  return 0;
}

Running this program, on my workstation (Windows XP), returned the following results. I'd be interested in knowing if anyone else gets a new BSTR anywhere along the way.

L"hello world!"-sourced BSTR
        BSTR=0x00175bdc, length 12, 24 bytes
resizing bstr1 to source L""
L""-sourced reallocated BSTR
        BSTR=0x00175bdc, length 0, 0 bytes
resizing bstr1 to source L"hello!"
L"hello!"-sourced reallocated BSTR
        BSTR=0x00175bdc, length 6, 12 bytes
resizing bstr1 to source L"hello world!+"
L"hello world!+"-sourced reallocated BSTR
        BSTR=0x00175bdc, length 13, 26 bytes

I tried this program again, but this time starting with an empty widechar string (L""). This should cover the case of starting with a BSTR with no string-length defined, and seeing if it actually has implicit size. When I ran it, I found that I still received the same BSTR back. I expect, though, that results may vary here.

Here's the source:

int _tmain(int argc, _TCHAR* argv[])
{
  BSTR bstr1 = SysAllocString(L"");

  _tprintf(_T("L\"\"-sourced BSTR\r\n")
    _T("\tBSTR=0x%8.8x, length %d, %d bytes\r\n"), 
    bstr1, SysStringLen(bstr1), SysStringByteLen(bstr1));

  _tprintf(_T("resizing bstr1 to source L\"hello world!\"\r\n"));
  SysReAllocString(&bstr1, L"hello world!");
  _tprintf(_T("L\"hello world!\"-sourced reallocated BSTR\r\n")
    _T("\tBSTR=0x%8.8x, length %d, %d bytes\r\n"), 
    bstr1, SysStringLen(bstr1), SysStringByteLen(bstr1));

  _tprintf(_T("resizing bstr1 to source L\"hello!\"\r\n"));
  SysReAllocString(&bstr1, L"hello!");
  _tprintf(_T("L\"hello!\"-sourced reallocated BSTR\r\n")
    _T("\tBSTR=0x%8.8x, length %d, %d bytes\r\n"), 
    bstr1, SysStringLen(bstr1), SysStringByteLen(bstr1));

  _tprintf(_T("resizing bstr1 to source L\"hello world!+\"\r\n"));
  SysReAllocString(&bstr1, L"hello world!+");
  _tprintf(_T("L\"hello world!+\"-sourced reallocated BSTR\r\n")
    _T("\tBSTR=0x%8.8x, length %d, %d bytes\r\n"),
    bstr1, SysStringLen(bstr1), SysStringByteLen(bstr1));

  SysFreeString(bstr1);

  return 0;
}

The results:

L""-sourced BSTR
        BSTR=0x00175bdc, length 0, 0 bytes
resizing bstr1 to source L"hello world!"
L"hello world!"-sourced reallocated BSTR
        BSTR=0x00175bdc, length 12, 24 bytes
resizing bstr1 to source L"hello!"
L"hello!"-sourced reallocated BSTR
        BSTR=0x00175bdc, length 6, 12 bytes
resizing bstr1 to source L"hello world!+"
L"hello world!+"-sourced reallocated BSTR
        BSTR=0x00175bdc, length 13, 26 bytes
meklarian
But `SysReAllocString` creates a new pointer so the empty string is still immutable, you just allocate a new string which is big enough.
Motti
It may create a new pointer, but it could provide the original BSTR with a new allocation length as well. Current implementations in recent versions of windows probably default to new BSTRs, but older implementations may have operated differently, likely versions of windows predating 32-bit Windows/COM.
meklarian
scratch that previous comment. see updated answer.
meklarian
Motti
meklarian