views:

3000

answers:

3

I have a simple XmlReader:

XmlReader r = XmlReader.Create(fileName);

while (r.Read())
{
    Console.WriteLine(r.Value);
}

The problem is, the Xml file has ISO-8859-9 characters in it, which makes XmlReader throw "Invalid character in the given encoding." exception. I can solve this problem with adding <?xml version="1.0" encoding="ISO-8859-9" ?> line in the beginning but I'd like to solve this in another way in case I can't modify the source file. How can I change the encoding of XmlReader?

+2  A: 

The XmlTextReader class (which is what the static Create method is actually returning, since XmlReader is the abstract base class) is designed to automatically detect encoding from the XML file itself - there's no way to set it manually.

Simply insure that you include the following XML declaration in the file you are reading:

<?xml version="1.0" encoding="ISO-8859-9"?>
Noldorin
+1  A: 

If you can't ensure that the input file has the right header, you could look at one of the other 11 overloads to the XmlReader.Create method.

Some of these take an XmlReaderSettings variable or XmlParserContext variable, or both. I haven't investigated these, but there is a possibility that setting the appropriate values might help here.

There is the XmlReaderSettings.CheckCharacters property - the help for this states:

Instructs the reader to check characters and throw an exception if any characters are outside the range of legal XML characters. Character checking includes checking for illegal characters in the document, as well as checking the validity of XML names (for example, an XML name may not start with a numeral).

So setting this to false might help. However, the help also states:

If the XmlReader is processing text data, it always checks that the XML names and text content are valid, regardless of the property setting. Setting CheckCharacters to false turns off character checking for character entity references.

So further investigation is warranted.

ChrisF
I found CheckCharacters too but it didn't helped, at least in my case..
Armagan
+10  A: 

To force .NET to read the file in as ISO-8859-9, just use one of the many XmlReader.Create overloads, e.g.

using(XmlReader r = XmlReader.Create(new StreamReader(fileName, Encoding.GetEncoding("ISO-8859-9")))) {
    while(r.Read()) {
        Console.WriteLine(r.Value);
    }
}

However, that may not work because, IIRC, the W3C XML standard says something about when the XML declaration line has been read, a compliant parser should immediately switch to the encoding specified in the XML declaration regardless of what encoding it was using before. In your case, if the XML file has no XML declaration, the encoding will be UTF-8 and it will still fail. I may be talking nonsense here so try it and see. :-)

Christian Hayter
Yeah, I don't think just changing the encoding of the StreamReader will work...
Noldorin
I just tried, this worked. Thanks!
Armagan