tags:

views:

51

answers:

4

Hi. I have the hex data:

48|65|6c|6c|6f|20|53|68|61|72|6f|6b|2e|

20|3d|43|46|3d|46|30|3d|45|38|3d|45|32|3d|45|35|3d|46|32|0d|0a|0d|0a|2e|0d|0a|

The text first string is "Hello Sharok"( without quotes ) The text second string is - "Привет" (without quotes, "Привет" is "Hello" on Russian ) How to convert this in readable text(First string is OK,the second string fails.)? CodePage: windows 1251

+1  A: 

Create an Encoding object for the windows-1251 encoding, and decode the byte array:

byte[] data = {
  0x48, 0x65, 0x6c, 0x6c, 0x6f, 0x20, 0x53, 0x68, 0x61, 0x72, 0x6f, 0x6b, 0x2e
};

string text = Encoding.GetEncoding(1251).GetString(data);

The second set of data doesn't decode into russian characters, but into this (including a space at the start and a line break (CR+LF) ending each of the three lines):

 =CF=F0=E8=E2=E5=F2

.

To get the string that you want, you would first have to decode the data into a string, then extract the hexadecimal codes from the string, convert those into bytes, and decode those bytes:

Encoding win = Encoding.GetEncoding(1251);
string text = win.GetString(
  Regex.Matches(win.GetString(data), "=(..)")
  .OfType<Match>()
  .Select(m => Convert.ToByte(m.Groups[1].Value, 16))
  .ToArray()
);
Guffa
+2  A: 

Second string is not Windows-1251 but quoted-printable " =CF=F0=E8=E2=E5=F2<CR><LF><CR><LF>." and decoded characters in it are actually Windows-1251. So you need to iterate the string, and build output string one by one character. If you run into escape sign (=) then next two character are hex digits of Windows-1251. Decode two digits and add resulting character to output string. Loop until end.

Dialecticus
+1  A: 

For the second one you can use this:

string input="20|3d|43|46|3d|46|30|3d|45|38|3d|45|32|3d|45|35|3d|46|32|0d|0a|0d|0a|2e|0d|0a";
byte[] bytes=input.Split('|').Select(s=>byte.Parse(s, System.Globalization.NumberStyles.HexNumber)).ToArray();
string text = Encoding.GetEncoding(1251).GetString(bytes);

StringBuilder text2=new StringBuilder();
for(int i=0;i<text.Length;i++)
{
  if (text[i]=='=')
  {
    string hex=text[i+1].ToString()+text[i+2].ToString();
    byte b=byte.Parse(hex, System.Globalization.NumberStyles.HexNumber);

    text2.Append(Encoding.GetEncoding(1251).GetString(new byte[]{b}));
    i+=2;
  }
  else
  {
    text2.Append(text[i]);
  }
}

First it decodes the | seperated string. Which the contains = escaped hex values the following loop decodes.

CodeInChaos