views:

2097

answers:

1

Apple's String Format Specifiers document claims,

The format specifiers supported by the NSString formatting methods and CFString formatting functions follow the IEEE printf specification; … You can also use these format specifiers with the NSLog function.

But, while the printf specification defines %C as an equivalent for %lc and %S as an equivalent for %ls, only %C and %S appear to work correctly with NSLog and +[NSString stringWithFormat:].

For example, consider the following code:

#import <Foundation/Foundation.h>

int main (int argc, const char * argv[]) {
    NSAutoreleasePool * pool = [[NSAutoreleasePool alloc] init];
    unichar str[3];
    str[0] = 63743;
    str[1] = 33;
    str[2] = (unichar)NULL;

    NSLog(@"NSLog");
    NSLog(@"%%S:  %S", str);
    NSLog(@"%%ls: %ls", str);

    NSLog(@"%%C:  %C", str[0]);
    NSLog(@"%%lc: %lc", str[0]);

    NSLog(@"\n");
    NSLog(@"+[NSString stringWithFormat:]");

    NSLog(@"%%S:  %@", [NSString stringWithFormat:@"%S", str]);
    NSLog(@"%%ls: %@", [NSString stringWithFormat:@"%ls", str]);

    NSLog(@"%%C:  %@", [NSString stringWithFormat:@"%C", str[0]]);
    NSLog(@"%%lc: %@", [NSString stringWithFormat:@"%lc", str[0]]);

    [pool drain];
    return 0;
}

Given the printf specification, I would expect each of the above pairs to print the same thing. But, when I run the code, I get the following output:

2009-03-20 17:00:13.363 UnicharFormatSpecifierTest[48127:10b] NSLog
2009-03-20 17:00:13.365 UnicharFormatSpecifierTest[48127:10b] %S:  !
2009-03-20 17:00:13.366 UnicharFormatSpecifierTest[48127:10b] %ls: ˇ¯!
2009-03-20 17:00:13.366 UnicharFormatSpecifierTest[48127:10b] %C:  
2009-03-20 17:00:13.367 UnicharFormatSpecifierTest[48127:10b] %lc: 
2009-03-20 17:00:13.367 UnicharFormatSpecifierTest[48127:10b] 
2009-03-20 17:00:13.368 UnicharFormatSpecifierTest[48127:10b] +[NSString stringWithFormat:]
2009-03-20 17:00:13.368 UnicharFormatSpecifierTest[48127:10b] %S:  !
2009-03-20 17:00:13.369 UnicharFormatSpecifierTest[48127:10b] %ls: ˇ¯!
2009-03-20 17:00:13.369 UnicharFormatSpecifierTest[48127:10b] %C:  
2009-03-20 17:00:13.370 UnicharFormatSpecifierTest[48127:10b] %lc:

Am I doing something wrong, or is this a bug in Apple's code?

+3  A: 

On Mac OS X, <machine/_types.h> defines wchar_t as int, so it's four bytes (32 bits) on all currently-supported architectures.

As you note, the printf(3) manpage defines %S as equivalent to %ls, which takes a pointer to some wchar_t characters (wchar_t *).

The Cocoa documentation you linked to (and its CF equivalent), however, does define %S separately:

  • %S: Null-terminated array of 16-bit Unicode characters

Emphasis added. Also, the same goes for %C.

So, this is not a bug. CF and Cocoa interpret %S and %C differently from how printf and its cousins interpret them. CF and Cocoa treat the character(s) as UTF-16, whereas printf (presumably) treats them as UTF-32.

The CF/Cocoa interpretation is more useful when working with Core Services, as some APIs (such as the File Manager) will hand you text as an array of UniChars, not a CFString; as long as you null-terminate that array, you can use it with %S to print the string.

Peter Hosey
Thanks; that makes a lot of sense! I think I would characterize this as a bug in the documentation, then, because the NSString formatting methods clearly do not follow the printf specification in this case. Does that seem like a fair assessment?
Evan DiBiase
Moderate bug; it should say something like “…, with a couple of variations”, and mark them in the table. The document does correctly describe CF/Cocoa's interpretations in the table, although it does not flag them as different from printf's definitions.
Peter Hosey
That's about how I'd characterize it, as well. Thanks again for the help!
Evan DiBiase