views:

569

answers:

3
+3  Q: 

C# and utf8_decode

Is there a C# utf8_decode equivalent?

+3  A: 

Yes. You can use the System.Text.Encoding class to convert the encoding.

string source = "Déjà vu";
Encoding unicode = Encoding.Unicode;
// iso-8859-1 <- codepage 28591
Encoding latin1 = Encoding.GetEncoding(28591); 
Byte[] result = Encoding.Convert(unicode, latin1, unicode.GetBytes(s));
// result contains the byte sequence for the latin1 encoded string

edit: or simply

string source = "Déjà vu";
Byte[] latin1 = Encoding.GetEncoding(28591).GetBytes(source);

string (System.String) is always unicode encoded, i.e. if you convert the byte sequence back to string (Encoding.GetString()) your data will again be stored as utf-16 codepoints again.

VolkerK
+9  A: 

Use the Encoding class.

For example:

byte[] bytes = something;
string str = Encoding.UTF8.GetString(bytes);
SLaks
nitpicking: the example is more like utf8_**en**code().
VolkerK
A: 

If your input is a string here is a method that would probably work (assuming your from wester europe :)

public string Utf8Decode(string inputDate)
{
    return Encoding.GetEncoding("iso-8859-1").GetString(Encoding.UTF8.GetBytes(inputDate));
}

Of course, if the current encoding of the inputData is not latin1, change the "iso-8859-1" to the correct encoding.

Manitra Andriamitondra
This will return a System.String as if `inputDate` was (falsely) utf8 encoded but really contains an iso-8859-1 byte sequence. E.g. inputDate="Déjà vu". UTF8.GetBytes() returns the sequence {68, 195, 169, 106, 195, 160, 32, 118, 117 }. ("iso-8859-1").GetString() will interpret each single byte (since it's a single-byte encoding) as a character. The resulting string is `Déjà vu`
VolkerK
Hello, the user was refering to a php function.It is probably because he has some wrongly encoded "string", I mean something like 'Déjà vu' and want it to become 'Déjà vu'. This happens when you communicate with a mysql server with a utf8 encoding and you forget to specify the utf8 charset in the connection string.But I agree with you, utf8decode on a .NET string is not a good thing.
Manitra Andriamitondra