tags:

views:

130

answers:

2

It seems every time I use an XMLReader, I end up with a bunch of trial and error trying to figure out what I'm about to read versus what I'm reading versus what I just read. I always figure it out in the end, but I still, after using it numerous times, don't seem to have a firm grasp of what an XMLReader is actually doing when I call the various functions. For example, when I call Read the first time, if it reads an element start tag, is it now at the end of the element tag, or ready to begin reading the element's attributes? Does it know the values of the attributes yet if I call GetAttribute? What will happen if I call ReadStartElement at this point? Will it finish reading the start element, or look for the next one, skipping all the attributes? What if I want to read a number of elements -- what's the best way to try to read the next element and determine what its name is. Will Read followed by IsStartElement work, or will IsStartElement be returning information about the node following the element I just read?

As you can see I really am lacking an understanding of where an XMLReader is at during the various phases of its reading and how its state is affected by various read functions. Is there some simple pattern that I've simply failed to notice?

Here's another example of the problem (taken from the responses):

string input = "<machine code=\"01\">The Terminator" +
   "<part code=\"01a\">Right Arm</part>" +
   "<part code=\"02\">Left Arm</part>" +
   "<part code=\"03\">Big Toe</part>" +
   "</machine>";

using (System.IO.StringReader sr = new System.IO.StringReader(input))
{
   using (XmlTextReader reader = new XmlTextReader(sr))
   {
      reader.WhitespaceHandling = WhitespaceHandling.None;
      reader.MoveToContent();

      while(reader.Read())
      {
         if (reader.Name.Equals("machine") && (reader.NodeType == XmlNodeType.Element))
         {
            Console.Write("Machine code {0}: ", reader.GetAttribute("code"));
            Console.WriteLine(reader.ReadElementString("machine"));
         }
         if(reader.Name.Equals("part") && (reader.NodeType == XmlNodeType.Element))
         {
            Console.Write("Part code {0}: ", reader.GetAttribute("code"));
            Console.WriteLine(reader.ReadElementString("part"));
         }
      }
   }
}

First problem, the machine node is skipped completely. MoveToContent seems to move to the content of the machine element causing it to never be parsed. Furthermore, if you skip MoveToContent, you get an error: "'Element' is an invalid XmlNodeType." trying to ReadElementString, which I can't quite explain.

Next problem is, while reading the first part element, ReadElementString seems to position the reader at the beginning of the next part element after reading. This causes the reader.Read at the beginning of the next loop to skip over the next part element jumping right to the last part element. So the final output of this code is:

Part code 01a: Right Arm

Part code 03: Big Toe

This is a prime example of the confusign behavior of XMLReader that I'm trying to understand.

+3  A: 

Here's the thing... I've written a fair amount of serialization code (including a lot of xml processing), and I find myself in exactly the same boat as you. I have a very simple piece of guidance, therefore: don't.

I'll happily use XmlWriter as a way to write xml quickly, but I'd walk over hot coals before choosing to implement IXmlSerializable another time - I'd simply write a separate DTO and map the data into that; it also means the schema (for "mex", "wsdl", etc) comes for free.

Marc Gravell
Can you tell me what DTO means por favor?
Sergio Tapia
Data Transfer Object - See http://en.wikipedia.org/wiki/Data_transfer_object
Kris C
Essentially an object model geared for serialization / transport - for example if your *main* object model is immutable (no "setters") the DTO might have read/write properties (since that works well with some serializers), or a flattened hierarchy.
Marc Gravell
What *do* you use to parse/read XML then? XDocument? XmlDocument? That's less efficient, right? Or perhaps the more relevant question is, how do you serialize and deserialize objects, more specifically? (I have a very simple object that I'm writing to a very small file.)
BlueMonkMN
@BlueMonkMN - In the *general* sense, I would probably use protobuf-net ;-p If it had to be xml, then either `XmlSerializer` with a DTO that matched the expected layout (so no need for extra work), or `XDocument`.
Marc Gravell
Would it be appropriate to include a code sample of how a DTO is used to serve up XML serialization in your response so we don't have to go wander around aimlessly for a while researching DTOs before stumbling on the part that talks about XML? I used (DTO) XML serialization once long ago, but generally try to avoid it because it feels like a hassle to go apply attributes to all the class members to make it work when I'd really like to have all the serialization code in one place. (If you have a hierarchy of classes in different files, it can be a pain to follow.)
BlueMonkMN
+1  A: 

My latest solution (which works for my current case) is to stick with Read(), IsStartElement(name) and GetAttribute(name) in implementing a state machine.

using (System.Xml.XmlReader xr = System.Xml.XmlTextReader.Create(stm))
{
   employeeSchedules = new Dictionary<string, EmployeeSchedule>();
   EmployeeSchedule emp = null;
   WeekSchedule sch = null;
   TimeRanges ranges = null;
   TimeRange range = null;
   while (xr.Read())
   {
      if (xr.IsStartElement("Employee"))
      {
         emp = new EmployeeSchedule();
         employeeSchedules.Add(xr.GetAttribute("Name"), emp);
      }
      else if (xr.IsStartElement("Unavailable"))
      {
         sch = new WeekSchedule();
         emp.unavailable = sch;
      }
      else if (xr.IsStartElement("Scheduled"))
      {
         sch = new WeekSchedule();
         emp.scheduled = sch;
      }
      else if (xr.IsStartElement("DaySchedule"))
      {
         ranges = new TimeRanges();
         sch.daySchedule[int.Parse(xr.GetAttribute("DayNumber"))] = ranges;
         ranges.Color = ParseColor(xr.GetAttribute("Color"));
         ranges.FillStyle = (System.Drawing.Drawing2D.HatchStyle)
            System.Enum.Parse(typeof(System.Drawing.Drawing2D.HatchStyle),
            xr.GetAttribute("Pattern"));
      }
      else if (xr.IsStartElement("TimeRange"))
      {
         range = new TimeRange(
            System.Xml.XmlConvert.ToDateTime(xr.GetAttribute("Start"),
            System.Xml.XmlDateTimeSerializationMode.Unspecified),
            new TimeSpan((long)(System.Xml.XmlConvert.ToDouble(xr.GetAttribute("Length")) * TimeSpan.TicksPerHour)));
         ranges.Add(range);
      }
   }
   xr.Close();
}

After Read, IsStartElement will return true if you just read a start element (optinally checking the name of the element read), and you can access all the attributes of that element immediately. If all you need to read is elements and attributes, this is pretty straightforward.

Edit The new example posted in the question poses some other challenges. The correct way to read that XML seems to be like this:

using (System.IO.StringReader sr = new System.IO.StringReader(input))
{
   using (XmlTextReader reader = new XmlTextReader(sr))
   {
      reader.WhitespaceHandling = WhitespaceHandling.None;

      while(reader.Read())
      {
         if (reader.Name.Equals("machine") && (reader.NodeType == XmlNodeType.Element))
         {
            Console.Write("Machine code {0}: ", reader.GetAttribute("code"));
            Console.WriteLine(reader.ReadString());
         }
         if(reader.Name.Equals("part") && (reader.NodeType == XmlNodeType.Element))
         {
            Console.Write("Part code {0}: ", reader.GetAttribute("code"));
            Console.WriteLine(reader.ReadString());
         }
      }
   }
}

You have to use ReadString instead of ReadElementString in order to avoid reading the end element and skipping into the beginning of the next element (let the following Read() skip over the end element so it doesn't skip over the next start element). Still this seems somewhat confusing and potentially unreliable, but it works for this case.

After some additional thought, my opinion is that XMLReader is just too confusing if you use any methods to read content other than the Read method. I think it's much simpler if you confine yourself to the Read method to read from the XML stream. Here's how it would work with the new example (once again, it seems IsStartElement, GetAttribute and Read are the key methods, and you end up with a state machine):

while(reader.Read())
{
   if (reader.IsStartElement("machine"))
   {
      Console.Write("Machine code {0}: ", reader.GetAttribute("code"));
   }
   if(reader.IsStartElement("part"))
   {
      Console.Write("Part code {0}: ", reader.GetAttribute("code"));
   }
   if (reader.NodeType == XmlNodeType.Text)
   {
      Console.WriteLine(reader.Value);
   }
}
BlueMonkMN