views:

31

answers:

1

Hi. I am trying to escape double-byte (usually Japanese or Chinese) characters from a string so that they can be included in an RTF file. Thanks to poster falconcreek, I can successfully escape special characters (e.g. umlaut, accent, tilde) that are single-byte.

- (NSString *)stringFormattedRTF:(NSString *)inputString
{
NSMutableString *result = [NSMutableString string];

for ( int index = 0; index < [inputString length]; index++ ) {
    NSString *temp = [inputString substringWithRange:NSMakeRange( index, 1 )];
    unichar tempchar = [inputString characterAtIndex:index];

    if ( tempchar > 127) {
        [result appendFormat:@"\\\'%02x", tempchar]; 
    } else {
        [result appendString:temp];
    }
}
return result;
}

It appears this is looking for any unicode characters with a decimal value higher than 127 (which basically means anything not ASCII). If I find one, I escape it and translate that to a hex value.

EXAMPLE: Small "e" with acute accent gets escaped and converted to its hex value, resulting in "\'e9"

While Asian characters are above 127 decimal value, the output from the above appears to be reading the first byte of the unicode double byte character and encoding that then passing the second byte as is. For the end user it ends up ????.

Suggestions are greatly appreciated. Thanks.

UPDATED Code sample based on suggestion. Not detecting. :(

NSString *myDoubleByteTestString = @"blah は凄くいいアップです blah åèüñ blah";
NSMutableString *resultDouble = [NSMutableString string];
for ( int index = 0; index < [myDoubleByteTestString length]; index++ )
{
    NSString *tempDouble = [myDoubleByteTestString substringWithRange:NSMakeRange( index, 1 )];
NSRange doubleRange = [tempDouble rangeOfComposedCharacterSequenceAtIndex:index];
if(doubleRange.length > 2)
{
        NSLog(@"%@ is a double-byte character. Escape it.", tempDouble);
        // How to escape double-byte?
    [resultDouble appendFormat:tempDouble]; 
    }
else
{
        [resultDouble appendString:tempDouble];
    }
}
+1  A: 

Take a look at the code at rangeOfComposedCharacterSequenceAtIndex: to see how to get all the characters in a composed character. You'll then need to encode each of the characters in the resulting range.

Robot K
Thanks for that suggestion. I've been mucking with it based on what I can find related to rangeOfComposedCharacterSequenceAtIndex. Posted an update to my question with the current code attempt. I am not able to get it detecting, much less escaping the double-byte characters.
DenVog