views:

197

answers:

3

I have a question, which Unicode encoding to use while encoding .NET string into base64? I know strings are UTF-16 encoded on Windows, so is my way of encoding is the right one?

public static String ToBase64String(this String source) {
        return Convert.ToBase64String(Encoding.Unicode.GetBytes(source));
    }
+2  A: 

What you've provided is perfectly functional. It will produce a base64-encoded string of the bytes of your source string encoded in UTF-16.

If you're asking if UTF-16 can represent any character in your string, then yes. The only difference between UTF-16 and UTF-32 is that UTF-16 is a variable-length encoding; it uses two-bytes to represent characters within a subset, and four-bytes for all other characters.

There are no unicode characters that cannot be represented by UTF-16.

Adam Robinson
A: 

MSDN confirms that UnicodeEncoding class represents a UTF-16 encoding of Unicode characters.

abatishchev
If my string will have not only English letters and decimal digits in it, will it still work properly?
chester89
@chester89: that is what for Unicode is!
abatishchev
A: 

Be aware that you don't have to use UTF-16 just because that's what .NET strings use. When you create that byte array, you're free to choose any encoding that will handle all the characters in your string. For example, UTF-8 would be more efficient if the text is in a Latin-based language, but it can still handle every known character.

The most important concern is that whatever software decodes the base64 string, needs to know which encoding to apply to the byte array to re-create the original string.

Alan Moore