Which character encoding (or combinations of encodings) represents the character ö
(U+00F6
, LATIN SMALL LETTER O WITH DIAERESIS
or simply put chr(246)
in ISO-8859-1) as the four octets combination chr(195) . chr(63) . chr(194) . chr(164)
?
views:
99answers:
1
+2
A:
This page lists a fairly comprehensive set of all of the various binary representations of that particular character, and none of them are even close to what you have. Are you certain that there isn't some other transformation being done on top of the text encoding?
If you think that the data might have been encoded multiple times, try this:
public static IEnumerable<Encoding> FindEncodingPath(char desiredChar, byte[] data)
{
return FindEncodingPath(new char[] { desiredChar }, data, 5);
}
private static IEnumerable<Encoding> FindEncodingPath(char[] desiredChar, byte[] data, int iterationsLeft)
{
List<Encoding> encodings = null;
foreach(Encoding enc in Encoding.GetEncodings())
{
byte[] temp = enc.GetBytes(desiredChar);
bool match = false;
if(temp.Length == data.Length)
{
match = true;
for(int i = 0; i < data.Length; i++)
{
if(data[i] != temp[i])
{
match = false;
break;
}
}
}
if(match)
{
encodings = new List<Encoding>();
encodings.Add(enc);
break;
}
else if(iterationsLeft > 0)
{
IEnumerable<Encoding> tempEnc = FindEncodingPath(desiredChar, temp, iterationsLeft - 1);
if(tempEnc != null)
{
encodings = new List<Encoding>();
encodings.Add(enc);
encodings.AddRange(tempEnc);
break;
}
}
}
return encodings;
}
Adam Robinson
2010-04-23 19:48:24
The input could very well be messed up in the sense that multiple encodings could have been made :-\
knorv
2010-04-23 21:30:57