tags:

views:

57

answers:

1

In an objective-c/cocoa app, I am using c functions to open a text file, read it line-by-line and use some lines in a third-party function. In psuedo-code:

char *line = fgets(aFile);
library_function(line);  // This function calls for a utf-8 encoded char * string

This works fine until the input file contains special characters (such as accents or the UTF-8 BOM) whereupon the library function outputs mangled characters.


However, if I do this:

char *line = fgets(aFile);
NSString *stringObj = [NSString stringWithUTF8String:line];
library_function([stringObj UTF8String]);

Then it all works fine and the string is outputted correctly.


What is that [NSString... line doing that I'm not? Am I doing something wrong with how the line is fetched initially? Or is it something else entirely?

+1  A: 

UTF-8 is a multi-byte character set (see wikipedia), which means some characters require multiple bytes (the accented ones you've run into). C's char type is a single byte, so C's definition of "character" doesn't match Unicode's.

If you want to read Unicode with the standard C RTL, you'll also need to use a Unicode conversion library, such as libiconv.

(Using wchar_t may also work; I've never researched it.)

Or you can use NSString, which already supports Unicode.

Dewayne Christensen
Thanks, I think I'm getting that part. The bit I don't get now is why I have a valid `char` type string after the cocoa call, but not before. What is it doing to the string to make it suddenly valid?
Ben