views:

165

answers:

4

I'm running into a bit of a weird issue. Whenever I create a new text file in my iOS application, I set its encoding to be NSUTF8StringEncoding. If I edit the file and input any characters with diacritics and save the changes, the diacritics render properly in some applications such as BBEdit, TextMate, cat and vi but not in others such as TextEdit, Quick Look and Pages.

I'm using the following code to save the contents of a UITextView to the plain txt files.

  NSError *error;
  NSString *dataString = self.textView.text;
  BOOL savedChanges = [dataString writeToFile:fullPath atomically:YES encoding:NSUTF8Encoding error:&error]; 
  if (!savedChanges)
  {
    // Pop up an alert saying something went wrong.
  }

What happens after saving my file.  Should say I Love It with some weird diacritics

The unix file command reports that the saved file is indeed "UTF-8 Unicode text, with no line terminators"

What's even weirder is if I save the file again without changing the contents of the text, the file will then render properly in Quick Look & TextEdit on my Mac.

Any help would be appreciated.

+1  A: 

This is a guess, but could it have something to do with a lack of a byte order mark? For UTF8, the BOM looks like EF BB BF in hex, and should be the very first thing in the file.

Colin Barrett
The BOM is not needed for UTF-8 (as its byte ordering is consistent regardless of endianness), and should generally be avoided as a lot of apps don't properly deal with a UTF-8 BOM.
adurdin
Hrm, yeah, you are probably right.
Colin Barrett
However, after a quick test, putting the BOM in *does* cause QuickLook and TextEdit to interpret it correctly in this case. Possibly the file is too short for the OS to identify it with certainty as UTF-8. It's certainly treating it as MacRoman.
adurdin
A: 

Another alternative if you want to avoid the BOM is to set the com.apple.TextEncoding extended attribute to UTF-8;134217984 on the file -- See here for more details.

I don't know how you would do that from code, but xattr -w com.apple.TextEncoding 'UTF-8;134217984' filename.txt will do it at the command line to confirm that it fixes the issue for you.

adurdin
+1  A: 

If you save a text file with an UTF BOM, and the com.apple.TextEncoding xattr is not set, any software that opens it will have to guess at the correct character encoding. Some apps guess UTF-8, some guess Mac OS Roman, and others guess something else.

You can replicate this behavior by saving a file as UTF-8 with no BOM, and then in Terminal give the xattr -d com.apple.TextEncoding filename.txt command.

To set the xattr, you would call setxattr(). There doesn't seem to be a documented way to set it via a Cocoa API. You could also prefix your data with the UTF-8 BOM.

There's the question of what character encoding should be assumed when the BOM and xattr are missing. Is it a bug if it defaults to Mac OS Roman? Should UTF-8 be the default?

lucius
A: 

I experienced the same issue just yesterday! My problem was that files created with by NSString writeToFile:atomically:encoding:error: were being read and interpreted by QuickLook and Text Edit perfectly, but files created by writing UTF-8 data straight to a file with NSFileHandle did not work in QL & TE (the automatic encoding got it wrong).

Peter Hosey pointed me to this question and after some testing it turns out that the NSString method does not use the BOM, instead it just uses the extended attribute com.apple.TextEncoding. With some code I found online I created an NSString category to allow you to set this value easily by passing a file path and a NSStringEncoding value.

NSString (FileTextEncodingAttribute) on gist.github

An example would be:

[NSString setTextEncodingAttribute:NSUTF8StringEncoding atPath:@"myfile.txt"];

I've tested this and it works perfectly :) Who knew writing a plain text file would take so much research!!

Michael Waterfall