The snippet says it all :-)
UTF8Encoding enc = new UTF8Encoding(true/*include Byte Order Mark*/);
byte[] data = enc.GetBytes("a");
// data has length 1.
// I expected the BOM to be included. What's up?
The snippet says it all :-)
UTF8Encoding enc = new UTF8Encoding(true/*include Byte Order Mark*/);
byte[] data = enc.GetBytes("a");
// data has length 1.
// I expected the BOM to be included. What's up?
You wouldn't want it to be used for every call to GetBytes, otherwise you'd have no way of (say) writing a file a line at a time.
By exposing it with GetPreamble, callers can insert the preamble just at the appropriate point (i.e. at the start of their data). I agree that the documentation could be a lot clearer though.
Because it is expected that GetBytes()
will be called lots of times... you need to use:
byte[] preamble = enc.GetPreamble();
(only call it at the start of a sequence) and write that; this is where the BOM lives.
Thank you both. The following works, and LINQ makes the combination simple :-)
UTF8Encoding enc = new UTF8Encoding(true);
byte[] data = enc.GetBytes("a");
byte[] combo = enc.GetPreamble().Concat(data).ToArray();
Note that in general, you don't need the Byte Order Mark for UTF-8 anyway. It's main purpose is to tell UTF16 BE and UTF16 LE apart. There is no such thing as UTF8 LE and UTF8 BE.