views:

282

answers:

2

I'm reading an XML file from a REST web service, parsing it, and displaying the details in a UITableView. The XML file is in encoded as iso-8859-1 and contains accented characters. If I just add the string to the tableview then I get a junk character displayed, so I've tried to convert it to UTF8 but it gets converted to a question mark, implying it doesn't understand the character.

Here's the code:

foreach(XmlNode myNode in myNodeList)
{
    Encoding isoEnc = Encoding.GetEncoding ("iso-8859-1");

    string utfResult = Encoding.UTF8.GetString (isoEnc.GetBytes(myNode.InnerText));

    _myCollection.Add(utfResult);
}

Any ideas what's going on here, and how to display the accented chars?

+1  A: 

Well, your "conversion" to UTF-8 is highly suspicious. You're basically saying that you know better than the XML file - that although it claims to be ISO-8859-1, you really know it was encoded in UTF-8. Do you have any reason to believe that?

If you know what the characters are meant to be, I suggest you add some logging to indicate the Unicode values of those characters (as integers) and compare them with the code charts on Unicode.org. Then you'll know whether your problem is in displaying the characters, or reading them from the feed in the first place.

Jon Skeet
Jon, thanks for that. I've done just that and it looks as though it's in the reading as opposed to the displaying. The XML defined with encoding="iso-8859-1", but if I build a byte array of the string taken from the node using myNode.InnerText, then the byte is 3F which is a question mark. If I build a unicode array the it gives me back FFFD, which is a question mark in a diamond, which is what is being displayed on the UITableView.
Ira Rainey
Don't build a byte array from the parsed XML - look at the XML itself in a hex editor. Alternative, use InnerText but don't convert it to a byte array - cast the first character to an integer. Note that U+FFFD is the "replacement" character, which is meant to be used for characters which aren't supported by Unicode. Sounds suspicious.
Jon Skeet
Getting there. The string in question is "Ynys Môn", with the accented char obviously being the problem in this instance.Looking at the XML in a hex editor I can see that the char is F4, which is correct.If I make a string using the above, and display it in the UITableView, then it displays fine. But chop the char out of the InnerText property of that node, convert it to an int, it gives 65533 (or U+FFFD). Here's the code: char tmpChar = Convert.ToChar (myNode.InnerText.Substring (6, 1)); int charVal = Convert.ToInt32 (tmpChar); Console.WriteLine (charVal);
Ira Rainey
This kind of implies that something is being lost in taking the string from the XML doc.
Ira Rainey
+2  A: 

OK, problem now solved. It seems that my error was assuming that the StreamReader would deal with the iso-8859-1 encoding by default. I changed my StreamReader constructor from:

StreamReader reader = new StreamReader (response.GetResponseStream ());

to:

StreamReader reader = new StreamReader (response.GetResponseStream (), Encoding.GetEncoding("iso-8859-1"));

By telling the StreamReader to expect the correct encoding, everything else just falls into place.

Ira Rainey