The NSString -initWithData:encoding:
method returns nil
if it fails, so you can try one encoding after another until you find one that converts. This doesn't guarantee that you'll convert all the characters correctly, but if your feed source isn't sending you correctly encoded XML, then you'll probably have to live with it.
The basic idea is:
// try the most likely encoding
NSString xmlString = [[NSString alloc] initWithData:xmlData
encoding:NSUTF8StringEncoding];
if (xmlString == nil) {
// try the next likely encoding
xmlString = [[NSString alloc] initWithData:xmlData
encoding:NSWindowsCP1252StringEncoding];
}
if (xmlString == nil) {
// etc...
}
To be generic and robust, you could do the following until successful:
1.) Try the encoding specified in the Content-Type header of the HTTP response (if any)
2.) Check the start of the response data for a byte order mark and if found, try the indicated encoding
3.) Look at the first two bytes; if you find a whitespace character or '<' paired with a nul/zero character, try UTF-16 (similarly, you can check the first four bytes to see if you have UTF-32)
4.) Scan the start of the data looking for the <?xml ... ?>
processing instruction and look for encoding='something'
inside it; try that encoding.
5.) Try some common encodings. Definitely check Windows Latin-1, Mac Roman, and ISO Latin-1 if your data source is in English.
6.) If none of the above work, you could try removing all bytes greater than 127 (or substitute '?' or another ASCII character) and convert the data using the ASCII encoding.
If you don't have an NSString by this point, you should fail. If you do have an NSString, you should look for the encoding
declaration in the <?xml ... ?>
processing instruction (if you didn't already in step 4). If it's there, you should convert the NSString back to NSData using that encoding; if it's not there, you should convert back using UTF-8 encoding.
Also, the CFStringConvertIANACharSetNameToEncoding()
and CFStringConvertEncodingToNSStringEncoding()
functions can help get the NSStringEncoding that goes with the encoding name form the Content-Type
header or the <?xml ... ?>
processing instruction.