ansaurus

Question

Answer 1

A:

Normally, if you have a £ encoded as ISO-8859-1 (ie. a single byte 0xA3), that's not going to form part of a valid UTF-8 byte sequence, unless you're unlucky and it comes right after another top-bit-set character in such a way to make them work together as a UTF-8 sequence. (You could guard against that by putting a £ on its own at the top of the file.)

So no editor should open any such file as UTF-8; if it did, it'd lose the £ completely. If your editor does that, “use a different editor”—seriously! If your problem is that your editor is loading files that don't contain £ or any other non-ASCII character as UTF-8, causing any new £ you add to them to be saved as UTF-8 afterwards, then again, simply adding a £ character on its own to the top of the file should certainly stop that.

What you can't necessarily do is make the editor load it as ISO-8859-1 as opposed to any other character set where all single top-bit-set bytes are valid. It's only multibyte encodings like UTF-8 and Shift-JIS which you can exclude them by using byte sequences that are invalid for that encoding.

What will usually happen on Windows is that the editor will load the file using the system default code page, typically 1252 on a Western machine. (Not actually quite the same as ISO-8859-1, but close.)

Some editors have a feature where you can give them a hint what encoding to use with a comment in the first line, eg. for vim:

# vim: set fileencoding=iso-8859-1 :

The syntax will vary from editor to editor/configuration. But it's usually pretty ugly. Other controls may exist to change default encodings on a directory basis, but since we don't know what you're using...

In the long run, files stored as ISO-8859-1 or any other encoding that isn't UTF-8 need to go away and die, of course. :-)

bobince 2010-07-09 17:16:07

A wonderfully technical answer. I will try it...

Dougal 2010-07-09 17:38:24

Best answer for raw technicalness *and* because "use a different editor" (kind of) turned out to be the best solution! I am embarrassed and humbled ;-)

Dougal 2010-07-11 17:44:03

Answer 2

A:

You can put character ÿ (0xFF) in the file. It's invalid in UTF8. BBEdit on Mac correctly identifies it as ISO-8859-1. Not sure how your editor of choice will do.

Stephen Chu 2010-07-09 17:18:59

Cheeky! I will try it...

Dougal 2010-07-09 17:36:01

ansaurus

tags:

views:

answers:

How to "force" a file's ISO-8859-1ness?

related questions