views:

533

answers:

1

Xcode complaints about "multi-character character contant"'s when I try to do the following:

static unichar accent characters[] = { 'ā', 'á', 'ă', 'à' };

How do you make an array of characters, when not all of them are ascii? The following works just fine

static unichar accent[] = { 'a', 'b', 'c' }; 

Workaround

The closest work around I have found is to convert the special characters into hex, ie this works:

static unichar accent characters[] = { 0x0100, 0x0101, 0x0102 };
+6  A: 

It's not that Objective-C doesn't like it, it's that C doesn't. The constant 'c' is for char which has 1 byte, not unichar which has 2 bytes. (see the note below for a bit more detail.)

There's no perfectly supported way to represent a unichar constant. You can use

char* s="ü";

in a UTF-8-encoded source file to get the unicode C-string, or

NSString* s=@"ü";

in a UTF-8 encoded source file to get an NSString. (This was not possible before 10.5. It's OK for iPhone.)

NSString itself is conceptually encoding-neutral; but if you want, you can get the unicode character by using -characterAtIndex:.

Finally two comments:

  • If you just want to remove accents from the string, you can just use the method like this, without writing the table yourself:

    -(NSString*)stringWithoutAccentsFromString:(NSString*)s
    {
        if (!s) return nil;
        NSMutableString *result = [NSMutableString stringWithString:s];
        CFStringFold((CFMutableStringRef)result, kCFCompareDiacriticInsensitive, NULL);
        return result;
    }
    

    See the document of CFStringFold.

  • If you want unicode characters for localization/internationalization, you shouldn't embed the strings in the source code. Instead you should use Localizable.strings and NSLocalizedString. See here.

Note: For arcane historical reasons, 'a' is an int in C, see the discussions here. In C++, it's a char. But it doesn't change the fact that writing more than one byte inside '...' is implementation-defined and not recommended. For example, see ISO C Standard 6.4.4.10. However, it was common in classic Mac OS to write the four-letter code enclosed in single quotes, like 'APPL'. But that's another story...

Another complication is that accented letters are not always represented by 1 byte; it depends on the encoding. In UTF-8, it's not. In ISO-8859-1, it is. And unichar should be in UTF-16. Did you save your source code in UTF-16? I think the default of XCode is UTF-8. GCC might do some encoding conversion depending on the setup, too...

Yuji
Technically, literal characters `'a'` are of type `int` in C.
Chris Lutz
You're perfectly right. Corrected.
Yuji
Doesnt solve my problem, but thats the closes to correct answer I think I am going to get. (:
corydoras
About your workaround: Be very careful in embedding unicode character in hex! If I remember correctly, the unichar in OS X is UTF16 in the platform endianness. So, your code might not work in both PPC and intel, and/or iPhone. If you only care about one platform that's fine, but you should keep that in mind in case Apple changes the CPU.
Yuji