tags:

views:

222

answers:

1

hi im a beginner in programming and network development. i have a question regarding ASCII and Unicode encoding.

in msdn and other web examples do the following:

byte[] byteData = Encoding.ASCII.GetBytes(data);

is this because these code samples are old? shouldn't it be:

byte[] byteData = Encoding.Unicode.GetBytes(data);

thanks for your input!

+9  A: 

It depends - do you want the result to be in ASCII or UTF-16? Each is wrong when you want the other.

If you're talking some network protocol, you must find out which character encoding is expected by the protocol. Use the wrong encoding, and Bad Things Will Happen.

Of course ASCII has massive restrictions - it's very English-based (Latin characters only, no accents) but it's correct for some protocols. Others may use UTF-16 (Encoding.Unicode), UTF-8 or other encodings... or they'll let you specify the encoding yourself within the protocol.

Jon Skeet
im implementing both side of the client/server, so i get to decide.im seeing Unicode as the choice as it is capable of accomodationg different languages. i maybe wrong!
iEisenhower
@ikurtz: I'd suggest using UTF-8 instead of Unicode for most things, as it *usually* cuts down on traffic. There are exceptions to that (Far East characters often end up being bigger in UTF-8) and there are other downsides (you can't calculate the size of binary data directly from the number of code points) but it's generally a good idea IMO.
Jon Skeet
very good point concerning getting the message size! i understand why they use ASCII.
iEisenhower
I second using UTF-8, it's almost always the best choice for storage and on the wire. UTF-16LE (which Microsoft very misleadingly call “Unicode”) is only really good for fast in-memory access. However, if you ask for `Encoding.UTF8` in .NET you generally get an unfortunate variant of UTF-8 that puts a bogus, troublesome BOM at the front of the bytes. To get clean UTF-8 you have to say `new UTF8Encoding(false)`.
bobince
@bobince: That depends on how you're using the encoding. Encoding.GetBytes doesn't emit the BOM, for example.
Jon Skeet