Is there a C# utf8_decode equivalent?
+3
A:
Yes. You can use the System.Text.Encoding class to convert the encoding.
string source = "Déjà vu";
Encoding unicode = Encoding.Unicode;
// iso-8859-1 <- codepage 28591
Encoding latin1 = Encoding.GetEncoding(28591);
Byte[] result = Encoding.Convert(unicode, latin1, unicode.GetBytes(s));
// result contains the byte sequence for the latin1 encoded string
edit: or simply
string source = "Déjà vu";
Byte[] latin1 = Encoding.GetEncoding(28591).GetBytes(source);
string (System.String) is always unicode encoded, i.e. if you convert the byte sequence back to string (Encoding.GetString()) your data will again be stored as utf-16 codepoints again.
VolkerK
2009-10-19 13:28:09
+9
A:
Use the Encoding class.
For example:
byte[] bytes = something;
string str = Encoding.UTF8.GetString(bytes);
SLaks
2009-10-19 13:29:02
nitpicking: the example is more like utf8_**en**code().
VolkerK
2009-10-19 14:23:03
A:
If your input is a string here is a method that would probably work (assuming your from wester europe :)
public string Utf8Decode(string inputDate)
{
return Encoding.GetEncoding("iso-8859-1").GetString(Encoding.UTF8.GetBytes(inputDate));
}
Of course, if the current encoding of the inputData is not latin1, change the "iso-8859-1" to the correct encoding.
Manitra Andriamitondra
2009-10-19 13:40:10
This will return a System.String as if `inputDate` was (falsely) utf8 encoded but really contains an iso-8859-1 byte sequence. E.g. inputDate="Déjà vu". UTF8.GetBytes() returns the sequence {68, 195, 169, 106, 195, 160, 32, 118, 117 }. ("iso-8859-1").GetString() will interpret each single byte (since it's a single-byte encoding) as a character. The resulting string is `Déjà vu`
VolkerK
2009-10-19 14:12:49
Hello, the user was refering to a php function.It is probably because he has some wrongly encoded "string", I mean something like 'Déjà vu' and want it to become 'Déjà vu'. This happens when you communicate with a mysql server with a utf8 encoding and you forget to specify the utf8 charset in the connection string.But I agree with you, utf8decode on a .NET string is not a good thing.
Manitra Andriamitondra
2009-10-20 07:38:23