ansaurus

Question

Answer 1

+6 A:

It's not that Objective-C doesn't like it, it's that C doesn't. The constant 'c' is for char which has 1 byte, not unichar which has 2 bytes. (see the note below for a bit more detail.)

There's no perfectly supported way to represent a unichar constant. You can use

char* s="ü";

in a UTF-8-encoded source file to get the unicode C-string, or

NSString* s=@"ü";

in a UTF-8 encoded source file to get an NSString. (This was not possible before 10.5. It's OK for iPhone.)

NSString itself is conceptually encoding-neutral; but if you want, you can get the unicode character by using -characterAtIndex:.

Finally two comments:

If you just want to remove accents from the string, you can just use the method like this, without writing the table yourself:

-(NSString*)stringWithoutAccentsFromString:(NSString*)s
{
    if (!s) return nil;
    NSMutableString *result = [NSMutableString stringWithString:s];
    CFStringFold((CFMutableStringRef)result, kCFCompareDiacriticInsensitive, NULL);
    return result;
}

See the document of CFStringFold.

If you want unicode characters for localization/internationalization, you shouldn't embed the strings in the source code. Instead you should use Localizable.strings and NSLocalizedString. See here.

Note: For arcane historical reasons, 'a' is an int in C, see the discussions here. In C++, it's a char. But it doesn't change the fact that writing more than one byte inside '...' is implementation-defined and not recommended. For example, see ISO C Standard 6.4.4.10. However, it was common in classic Mac OS to write the four-letter code enclosed in single quotes, like 'APPL'. But that's another story...

Another complication is that accented letters are not always represented by 1 byte; it depends on the encoding. In UTF-8, it's not. In ISO-8859-1, it is. And unichar should be in UTF-16. Did you save your source code in UTF-16? I think the default of XCode is UTF-8. GCC might do some encoding conversion depending on the setup, too...

Yuji 2010-01-28 02:31:49

Technically, literal characters `'a'` are of type `int` in C.

Chris Lutz 2010-01-28 02:49:27

You're perfectly right. Corrected.

Yuji 2010-01-28 02:55:21

Doesnt solve my problem, but thats the closes to correct answer I think I am going to get. (:

corydoras 2010-01-29 01:12:30

About your workaround: Be very careful in embedding unicode character in hex! If I remember correctly, the unichar in OS X is UTF16 in the platform endianness. So, your code might not work in both PPC and intel, and/or iPhone. If you only care about one platform that's fine, but you should keep that in mind in case Apple changes the CPU.

Yuji 2010-01-29 03:22:04

ansaurus

tags:

views:

answers:

Objective c doesnt like my unichars?

Workaround

related questions