views:

663

answers:

1

Some sites that I am fetching data from are returning UTF-8 strings, with the UTF-8 characters escaped, ie: \u5404\u500b\u90fd

Is there a built in cocoa function that might assist with this or will I have to write my own decoding algorithm.

+3  A: 

There is no built-in function to do C unescaping.

You can cheat a little with NSPropertyListSerialization since an "old text style" plist supports C escaping via \Uxxxx:

NSString* input = @"ab\"cA\"BC\\u2345\\u0123";

// will cause trouble if you have "abc\\\\uvw"
NSString* esc1 = [input stringByReplacingOccurrencesOfString:@"\\u" withString:@"\\U"];
NSString* esc2 = [esc1 stringByReplacingOccurrencesOfString:@"\"" withString:@"\\\""];
NSString* quoted = [[@"\"" stringByAppendingString:esc2] stringByAppendingString:@"\""];
NSData* data = [quoted dataUsingEncoding:NSUTF8StringEncoding];
NSString* unesc = [NSPropertyListSerialization propertyListFromData:data
                   mutabilityOption:NSPropertyListImmutable format:NULL
                   errorDescription:NULL];
assert([unesc isKindOfClass:[NSString class]]);
NSLog(@"Output = %@", unesc);

but mind that this isn't very efficient. It's far better if you write up your own parser. (BTW are you decoding JSON strings? If yes you could use the existing JSON parsers.)

KennyTM
"There is no built in function to do it" is what I was trying to find out. I ended up rolling my own, just wanted to check I wasn't re-inventing the wheel.The existing JSON parsers are no where near forgiving enough on badly formed JSON output that are sometimes sent by dodgy web sites.
corydoras