views:

1555

answers:

2

Hi there,

I'm having strange encoding issue converting NSString to UTF8 cString. I'm fetching XML data from webserver. XML is correctly encoded with UTF-8 encoding. After fetching data, I convert it to NSString as follows:

NSString *XMLdata = [[[NSString alloc] initWithData: receivedData encoding: NSUTF8StringEncoding] autorelease];

When I write the result to the stdout with NSLog, output appears to be ok (all characters are readable).

But when I try to get cString with [XMLData UTF8String] or [XMLDdata cStringWithEncoding: NSUTF8StringEncoding], non-ascii characters (german in this case) are screwed (for instance "N√ºrnberg" instead of "Nürnberg").

I do not have any idea what's wrong with that. Am I missing something or is that some bug?

Any help is appreciated, thanks! Matthes

+1  A: 

Matthes, you are doing it correctly. Both conversions are done well, and apparently you are getting correct output. The strange results you are seeing is a result of NSLog not interpreting the C-string with UTF-8 encoding.

Try out the following piece of code. I put the UTF-8 encoding of "Nürnberg" in s[]. The ü character is represented by a two-byte sequence, 0xc3, 0xbc. The rest of the characters are encoded the same as their ASCII equivalents. (Verify with the UTF Converter and the UTF-8 encoding demo table).

char s[] = { 0x4e, 0xc3, 0xbc, 0x72, 0x6e, 0x62, 0x65, 0x72, 0x67, 0 };
printf(s); printf("\n");
NSLog(@"%s", s);

In the debugger's console window, you should get the following:

Nürnberg
2009-08-12 23:55:53.077 try8[4980:813] Nürnberg

The √º characters you are seeing in the NSLog output come from the Mac OS Roman encoding. If you follow the link, you'll find out that sure enough, 0xc3 maps to the character, and 0xbc maps to º. Apparently that's NSLog's encoding for C strings.

Oren Trutner
A: 

Hi Oren,

thanks for your reply, but my issue is that not only NSLog shows wrong characters, but when XML is parsed (using TinyXML) and data are saved to sqlite db (using CoreData), wrong characters are saved there as well.

I understand that those chars are multibyte sequences, but I do not understand why it's apparently not handled correctly when it's converted to cString...

Anyway, now I've tried to call [XMLData cStringWithEncoding: NSMacOSRomanStringEncoding] and that did the trick - so thank you for pointing me that way!

By chance, do you know how encoding used by the system can be determined? Reading reference, I realized that encoding depends on system setting (language, regional etc). I tried to figure out by calling [NSString defaultCStringEncoding], but it returns nil... I'd like to know if there is some consistent way how to handle such situations with various encodings (next time I can face to eastern european or whatever else).

thanks again, best

Matthes