ansaurus

Question

Why isn't the Byte Order Mark emitted from UTF8Encoding.GetBytes?

Answer 1

+5 A:

You wouldn't want it to be used for every call to GetBytes, otherwise you'd have no way of (say) writing a file a line at a time.

By exposing it with GetPreamble, callers can insert the preamble just at the appropriate point (i.e. at the start of their data). I agree that the documentation could be a lot clearer though.

Jon Skeet 2009-01-07 16:06:41

In general, you should be able to ignore the preamble, since your writer will insert it based on your encoding choice.

Ishmael 2009-01-23 19:17:06

Answer 2

+3 A:

Because it is expected that GetBytes() will be called lots of times... you need to use:

byte[] preamble = enc.GetPreamble();

(only call it at the start of a sequence) and write that; this is where the BOM lives.

Marc Gravell 2009-01-07 16:07:24

Answer 3

+2 A:

Thank you both. The following works, and LINQ makes the combination simple :-)

UTF8Encoding enc = new UTF8Encoding(true);
byte[] data = enc.GetBytes("a");
byte[] combo = enc.GetPreamble().Concat(data).ToArray();

frou 2009-01-07 16:28:31

Answer 4

+2 A:

Note that in general, you don't need the Byte Order Mark for UTF-8 anyway. It's main purpose is to tell UTF16 BE and UTF16 LE apart. There is no such thing as UTF8 LE and UTF8 BE.

MSalters 2009-01-09 14:07:08

It also allows you to differentiate UTF-8 files from ANSI files.

Ishmael 2009-01-23 19:15:22

Even Microsoft admits "ANSI" is a confusing name - even when it's used to describe a charset. "ANSI files" don't exist anyway; on Windows all files are binary (Mainframes did have true text files, but they didn't have "Microsoft ANSI")

MSalters 2009-02-03 14:34:24

ansaurus

tags:

views:

answers:

Why isn't the Byte Order Mark emitted from UTF8Encoding.GetBytes?

related questions