tags:

views:

162

answers:

2

Can anyone explain the difference between calling GetPreamble() on a newly instantiated utf8 encoding as opposed to the public ones available from the Encoding class?

byte[] p1 = Encoding.UTF8.GetPreamble();
byte[] p2 = new UTF8Encoding().GetPreamble();

p1 is the normal 3 byte utf-8 preamble, but p2 ends up being empty, which seems very wrong.

+3  A: 

The difference is that the UTF8 property of Enconding is created this way

new UTF8Encoding(true)

this indicates that encoderShouldEmitUTF8Identifier = true so the 3 byte preamble is there

and your call to the default constructor

new UTF8Encoding() 

that is equivalent to

new UTF8Encoding(false)

To get same results:

byte[] p1 = Encoding.UTF8.GetPreamble();
byte[] p2 = new UTF8Encoding(true).GetPreamble();
MarcosMeli
That makes sense, thank you.
Christian Ernst Rysgaard
I got some problems with that a time ago, glad that it helps :)
MarcosMeli
+1  A: 

So my code that gets all know preambles looks like this now:

var preambles = new Dictionary<string, byte[]>();
foreach (var encodingInfo in Encoding.GetEncodings()) {
    Encoding encoding = Encoding.GetEncoding(encodingInfo.Name);
    var preamble = encoding.GetPreamble();
    if (preamble != null && preamble.Length > 0)
        preambles.Add(encodingInfo.Name, preamble);
}

Turns out there arent a lot of them

utf-16      [2] 255 254
unicodeFFFE [2] 254 255
utf-32      [4] 255 254 0 0
utf-32BE    [4] 0 0 254 255
utf-8       [3] 239 187 191

This way I can write code that safely converts a byte array with an optional preamble to a string just by supplying a default encoder for the ones without a preamble. Yay

Christian Ernst Rysgaard