views:

659

answers:

3

I am opening a XML file using .NET XmlReader and saving the file in another filename and it seems that the DOCTYPE declaration changes between the two files. While the newly saved file is still valid XML, I was wondering why it insisted on changing original tags.

Dim oXmlSettings As Xml.XmlReaderSettings = New Xml.XmlReaderSettings()
oXmlSettings.XmlResolver = Nothing
oXmlSettings.CheckCharacters = False
oXmlSettings.ProhibitDtd = False
oXmlSettings.IgnoreWhitespace = True

Dim oXmlDoc As XmlReader = XmlReader.Create(pathToOriginalXml, oXmlSettings)
Dim oDoc As XmlDocument = New XmlDocument()
oDoc.Load(oXmlDoc)
oDoc.Save(pathToNewXml)

The following (in the original document):

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML Basic 1.1//EN" "http://www.w3.org/TR/xhtml-basic/xhtml-basic11.dtd"&gt;

becomes (notice the [ ] characters at the end):

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML Basic 1.1//EN" "http://www.w3.org/TR/xhtml-basic/xhtml-basic11.dtd"[]&gt;
+1  A: 

Probably the library parses the DOCTYPE element into an internal structure and then converts the structure back to text. It doesn't store the original string form.

David Norman
+3  A: 

There is a bug in System.Xml when you set XmlDocument.XmlResolver = null. The workaround is to create a custom XmlTextWriter:

    private class NullSubsetXmlTextWriter : XmlTextWriter
    {
        public NullSubsetXmlTextWriter(String inputFileName, Encoding encoding)
            : base(inputFileName, encoding)
        {
        }
        public override void WriteDocType(string name, string pubid, string sysid, string subset)
        {
            if (subset == String.Empty)
            {
                subset = null;
            }
            base.WriteDocType(name, pubid, sysid, subset);
        }
    }

In your code, create a new NullSubsetXmlTextWriter(pathToNewXml, Encoding.UTF8) and pass that object to the oDoc.Save() method.

Here is the Microsoft support case where you can read about the workaround (it describes the workaround but doesn't provide the code).

Mo Flanagan
A: 

Thanks to Mo for the xmltextwriter workaround - this solved by problem.

Dave Whiffin