Some sites that I am fetching data from are returning UTF-8 strings, with the UTF-8 characters escaped, ie: \u5404\u500b\u90fd
Is there a built in cocoa function that might assist with this or will I have to write my own decoding algorithm.
Some sites that I am fetching data from are returning UTF-8 strings, with the UTF-8 characters escaped, ie: \u5404\u500b\u90fd
Is there a built in cocoa function that might assist with this or will I have to write my own decoding algorithm.
There is no built-in function to do C unescaping.
You can cheat a little with NSPropertyListSerialization
since an "old text style" plist supports C escaping via \Uxxxx
:
NSString* input = @"ab\"cA\"BC\\u2345\\u0123";
// will cause trouble if you have "abc\\\\uvw"
NSString* esc1 = [input stringByReplacingOccurrencesOfString:@"\\u" withString:@"\\U"];
NSString* esc2 = [esc1 stringByReplacingOccurrencesOfString:@"\"" withString:@"\\\""];
NSString* quoted = [[@"\"" stringByAppendingString:esc2] stringByAppendingString:@"\""];
NSData* data = [quoted dataUsingEncoding:NSUTF8StringEncoding];
NSString* unesc = [NSPropertyListSerialization propertyListFromData:data
mutabilityOption:NSPropertyListImmutable format:NULL
errorDescription:NULL];
assert([unesc isKindOfClass:[NSString class]]);
NSLog(@"Output = %@", unesc);
but mind that this isn't very efficient. It's far better if you write up your own parser. (BTW are you decoding JSON strings? If yes you could use the existing JSON parsers.)