views:

152

answers:

4

I was trying to answer another question about the == operator and I created this code:

NSString *aString = @"Hello";
NSString *bString = aString;
NSString *cString = @"Hello";

if (aString == bString)
 NSLog(@"CHECK 1");

if (bString == cString)
 NSLog(@"CHECK 2");

if ([aString isEqual:bString])
 NSLog(@"CHECK 3");

if ([aString isEqual:cString])
 NSLog(@"CHECK 4");

NSLog(@"%i", aString);
NSLog(@"%i", bString);
NSLog(@"%i", cString);

But was surprised at the results:

Equal[6599:10b] CHECK 1
Equal[6599:10b] CHECK 2
Equal[6599:10b] CHECK 3
Equal[6599:10b] CHECK 4
Equal[6599:10b] 8240
Equal[6599:10b] 8240
Equal[6599:10b] 8240

Is there some compiler trickery going on here?

+2  A: 

Well for cString and aString, C,C++, and Objective C compilers can reuse a compile time string object if it is declared in more than one location.

chollida
+4  A: 

NSString is defined as an immutable type, so whenever the compiler can optimize things by combining identical strings, it should. As your code demonstrates, gcc clearly does perform this optimization for simple cases.

This is correct. The internal type of the string is actually an NSConstantString, and it cannot be released. It is made static by the compiler apparently overrides its -release method to no-op.
Dave DeLong
A: 

Maybe simple copy-on-write optimization? As all 3 strings point to the same 'set of characters' there's not point in creating separate copies until you modify one of the strings.

Probably the characters are stored in static part of memory (with the code) and NSStrings* point to that part of memory. Once you attempt to modify one of the strings it will create new string somewhere else (heap) and then reference that memory.

stefanB
You can’t modify NSStrings.
Ahruman
True, I was thinking more conceptually ... anyway seems like they all point to the same part of memory because they represent the same group of characters ...
stefanB
+5  A: 

There is clearly string uniquing going on, at least within a single compilation unit. I recommend you take a brief tour through man gcc during which you visit all uses of "string". You'll find a few options that are directly relevant to literal NSStrings and their toll-free-bridged counterparts, CFStrings:

  • -fconstant-string-class=class-name sets the name of the class used to instantiate @"..." literals. It defaults to NSConstantString unless you're using the GNU runtime. (If you don't know if you are, you aren't.)
  • -fconstant-cfstrings enables use of a builtin to create CFStrings when you write CFSTR(...).

You can disable uniquing for C string literals using -fwritable-strings, though this option is deprecated. I couldn't come up with a combination of options that would stop the uniquing of NSString literals in an Objective-C file. (Anyone want to speak to Pascal string literals?)

You see -fconstant-cfstrings coming into play in CFString.h's definition of the CFSTR() macro used to create CFString literals:

    #ifdef __CONSTANT_CFSTRINGS__
    #define CFSTR(cStr)  ((CFStringRef) __builtin___CFStringMakeConstantString ("" cStr ""))
    #else
    #define CFSTR(cStr)  __CFStringMakeConstantString("" cStr "")
    #endif

If you look at the implementation of the non-builtin __CFStringMakeConstantString() in CFString.c, you'll see that the function does indeed perform uniquing using a very large CFMutableDictionary:

    if ((result = (CFStringRef)CFDictionaryGetValue(constantStringTable, cStr))) {
        __CFSpinUnlock(&_CFSTRLock);
    }
    // . . .
    return result;

See also responses to the question, "What's the difference between a string constant and a string literal?"

Jeremy W. Sherman
+1 Excellent answer, jayw.
Dave DeLong