views:

765

answers:

1

Could someone explain this behaviour to me?

If you execute the snippet at the bottom of the post with the first string, it returns the exact same string as the one used for the input; that's what I expected.

input 1:

<?xml version='1.0' encoding='UTF-8'?>
<Company>
  <Creator>Me</Creator>
  <CreationDateTime>2010-01-25T21:58:32.493</CreationDateTime>
  <Contacts>
    <Contact>
      <ContactID>365</ContactID>
    </Contact>
  </Contacts>
</Company>

output 1:

<?xml version='1.0' encoding='UTF-8'?>
<Company>
  <Creator>Me</Creator>
  <CreationDateTime>2010-01-25T21:58:32.493</CreationDateTime>
  <Contacts>
    <Contact>
      <ContactID>365</ContactID>
    </Contact>
  </Contacts>
</Company>

Now if you use the second line (const string xml), which is exaclty the same string but on one line instead of two it returns the following

intput 2

<?xml version='1.0' encoding='UTF-8'?>
<Company>
  <Creator>Me</Creator>
  <CreationDateTime>2010-01-25T21:58:32.493</CreationDateTime>
  <Contacts>
    <Contact>
      <ContactID>365</ContactID>
    </Contact>
  </Contacts>
</Company>

output 2

<?xml version='1.0' encoding='UTF-8'?>
<Creator>Me</Creator>2010-01-25T21:58:32.493 
<Contacts>
  <Contact>
    <ContactID>365</ContactID>
  </Contact>
</Contacts>

The only difference between the 2 is that the first one has a line break right after the xml declaration but as you can see the second output misses the Parent tag and the third tag. Any thought?

Here is the code I used:

public void XmlReader_Eats_Tags_IsTrue()
    {
        //this first xml declaration is on two lines - line break is right after the xml declaration (I am not sure how to add the line break using the markdown, so if you execute the code on your machine, please add it)
        const string xml = @"<?xml version='1.0' encoding='UTF-8'?><Company><Creator>Me</Creator><CreationDateTime>2010-01-25T21:58:32.493</CreationDateTime><Contacts><Contact><ContactID>365</ContactID></Contact></Contacts></Company>";

        //The seconde xml declaration is on one line
        //const string xml = @"<?xml version='1.0' encoding='UTF-8'?><Company><Creator>Me</Creator><CreationDateTime>2010-01-25T21:58:32.493</CreationDateTime><Contacts><Contact><ContactID>365</ContactID></Contact></Contacts></Company>";

        BufferedStream stream = new BufferedStream(new MemoryStream());
        stream.Write(Encoding.ASCII.GetBytes(xml), 0, xml.Length);
        stream.Seek(0, SeekOrigin.Begin);
        StreamReader streamReaderXml = new StreamReader(stream);

        XmlReader xmlR = XmlReader.Create(streamReaderXml);

        XmlReaderSettings xmlReaderset = 
                         new XmlReaderSettings{ValidationType = ValidationType.Schema};
        xmlReaderset.Schemas.ValidationEventHandler += ValidationCallBack;

        MemoryStream ms = new MemoryStream();
        XmlWriterSettings xmlWriterSettings = 
                          new XmlWriterSettings{
                                  Encoding = new UTF8Encoding(false),
                                  ConformanceLevel = ConformanceLevel.Fragment
                          };

        using (XmlWriter xmlTw = XmlWriter.Create(ms, xmlWriterSettings))
        {
            using (XmlReader xmlRead = XmlReader.Create(xmlR, xmlReaderset))
            {
                int i = 0;
                while (xmlRead.Read())
                {
                    Console.WriteLine("{0}:{1}; node type: {2}", i, xmlRead.Name, xmlRead.NodeType);
                    // Reads the whole file and will call the validation handler subroutine if an error is detected.
                    xmlTw.WriteNode(xmlRead, true);
                    i++;
                }

                xmlTw.Flush();
                xmlRead.Close();
            }
            string xmlString = Encoding.UTF8.GetString(ms.ToArray());
            Console.WriteLine(xmlString);
        }
    }
+5  A: 

The problem is that you're using XmlWriter.WriteNode(reader, true) and calling XmlReader.Read(). WriteNode already moves the reader onto the sibling element, so you're effectively skipping over data when you then call Read again.

I suspect it happens to be working in the first version because you're skipping over whitespace in the second call to Read, and then reading the rest of the document in the second call to WriteNode.

Jon Skeet
You are absolutely right; if I set the `IgnoreWhitespace` property to `true` for the `XmlReaderSettings`, both of the examples skip the tags. Thanks for enlightening
Nip