views:

1957

answers:

5

We recently came across some sample code from a vendor for hashing a secret key for a web service call, their sample was in VB.NET which we converted to C#. This caused the hashing to produce different input. It turns out the way they were generating the key for the encryption was by converting a char array to a string and back to a byte array. This led me to the discovery that VB.NET and C#'s default encoder work differently with some characters.

C#:

Console.Write(Encoding.Default.GetBytes(new char[] { (char)149 })[0]);

VB:

Dim b As Char() = {Chr(149)}
Console.WriteLine(Encoding.Default.GetBytes(b)(0))

The C# output is 63, while VB is the correct byte value of 149. if you use any other value, like 145, etc, the output matches.

Walking through the debugging, both VB and C# default encoder is SBCSCodePageEncoding.

Does anyone know why this is?

I have corrected the sample code by directly initializing a byte array, which it should have been in the first place, but I still want to know why the encoder, which should not be language specific, appears to be just that.

A: 

The default encoding is machine dependent as well as thread dependent because it uses the current codepage. You generally should use something like Encoding.UTF8 so that you don't have to worry about what happens when one machine is using unicode and another is using 1252-ANSI.

JasonRShaver
A: 

Different operating systems might use different encodings as the default. Therefore, data streamed from one operating system to another might be translated incorrectly. To ensure that the encoded bytes are decoded properly, your application should use a Unicode encoding, that is, UTF8Encoding, UnicodeEncoding, or UTF32Encoding, with a preamble. Another option is to use a higher-level protocol to ensure that the same format is used for encoding and decoding.

from http://msdn.microsoft.com/en-us/library/system.text.encoding.default.aspx

can you check what each language produces when you explicitly encode using utf8?

marduk
+11  A: 

If you use ChrW(149) you will get a different result- 63, the same as the C#.

Dim b As Char() = {ChrW(149)}
Console.WriteLine(Encoding.Default.GetBytes(b)(0))

Read the documentation to see the difference- that will explain the answer

RichardOD
Here's a link to the documentation: http://msdn.microsoft.com/en-us/library/613dxh46(VS.80).aspx
Jon B
Cheers Jon- I was just in the process of adding a link.
RichardOD
Thanks! I was thinking it had something to do with the Chr() bit, but I wasn't sure how to avoid using Chr() in VB.NET.
Coderuckus
Glad I solved the mystery for you.
RichardOD
+4  A: 

The VB Chr function takes an argument in the range 0 to 255, and converts it to a character using the current default code page. It will throw an exception if you pass an argument outside this range.

ChrW will take a 16-bit value and return the corresponding System.Char value without using an encoding - hence will give the same result as the C# code you posted.

The approximate equivalent of your VB code in C# without using the VB Strings class (that's the class that contains Chr and ChrW) would be:

char[] chars = Encoding.Default.GetChars(new byte[] { 149 });
Console.Write(Encoding.Default.GetBytes(chars)[0]);
Joe
A: 

I believe the equivalent in VB is ChrW(149).

So, this VB code...

    Dim c As Char() = New Char() { Chr(149) }
    'Dim c As Char() = New Char() { ChrW(149) }
    Dim b As Byte() = System.Text.Encoding.Default.GetBytes(c)
    Console.WriteLine("{0}", Convert.ToInt32(c(0)))
    Console.WriteLine("{0}", CInt(b(0)))

produces the same output as this C# code...

    var c = new char[] { (char)149 };
    var b = System.Text.Encoding.Default.GetBytes(c);
    Console.WriteLine("{0}", (int)c[0]);  
    Console.WriteLine("{0}", (int) b[0]);
Cheeso