tags:

views:

63

answers:

2

It's Friday and my mind already seems to have moved to weekend thinking.

Given this xml structure -

<?xml version="1.0" encoding="utf-8"?>
<results requiredAttribute="somedatahere">
  <entry>
    <!-- Xml structure in here -->
  </entry>
  <entry>
    <!-- Xml structure in here -->
  </entry>
  <entry>
    <!-- Xml structure in here -->
  </entry>
</results>

And this code(cut down to the core code) that uses an xmlreader to read the data and asychronously return the data -

            response = (HttpWebResponse)request.GetResponse();

            using (var reader = XmlReader.Create(response.GetResponseStream()))
            {
                Logger.Info("Collector: Before attempt to read data for {0}", url);

                while (reader.Read())
                {
                    if (reader.NodeType == XmlNodeType.Element && reader.Name == "entry")
                    {
                        var el = XElement.ReadFrom(reader) as XElement;
                        if (el != null)
                            yield return el;
                    }
                }
            }

What is the easiest way to retrieve the value from the attribute requiredAttribute?

Key point to consider is that I don't at any point want to read the full xml file in as the file could be very big. Also the data is coming from an HttpStream so you can't always guarantee that the data is complete and subsequently that the outer result element is well formed. This seems to rule out reading the result element and then iterating through it's children.

A: 

Stick with a purely XmlReader based approach, until it hits the malformation it will give you parsed content.

Any other approach (XPathDocument, XElement, XmlDocument) will try to parse the whole document first, so you will just get the applicable exception.

Richard
Sorry but I don't see how your answer explains the easiest way to return the attribute value when using yield to async return the data retrieved?
ChoccyButton
@Choccy the `yield` has nothing to do with it, with malformed XML the `XElement` creation will fail.
Richard
The code is in use already and works fine. The problem comes if you attempt to read the results element as that seems to try to read the whole element. If you ignore that element and start reading at the entry element level then the reader just reads 1 entry at a time, which works fine for elements, but means you miss the attribute required
ChoccyButton
@ChoccyButton: exactly. All the non-`XmlReader` APIs read complete elements, and the complete element must be well formed. The *only* route to reading parts of elements is `XmlReader` (which is exactly what the sample code is doing with `entry` elements).
Richard
A: 
if (reader.NodeType == XmlNodeType.Element)
{
    if (reader.Name == "results")
    {
        if (reader.MoveToAttribute("requiredAttribute") && reader.ReadAttributeValue())
            yield return reader.Value;
    }
    if (reader.Name == "entry")
    {
        ...
    }
}

Test Program

using System;
using System.Collections.Generic;
using System.IO;
using System.Xml;

class Program
{
    static void Main(string[] args)
    {
        try
        {
            foreach (object value in Read())
                Console.WriteLine(value);
        }
        catch (XmlException ex)
        {
            Console.WriteLine(ex.Message);
        }
    }

    static IEnumerable<object> Read()
    {
        using (var file = File.OpenRead("Test.xml"))
        {
            var reader = XmlReader.Create(file, new XmlReaderSettings { IgnoreComments = true });
            while (reader.Read())
            {
                if (reader.NodeType == XmlNodeType.Element)
                {
                    yield return reader.Name;

                    if (reader.Name == "results")
                    {
                        if (reader.MoveToAttribute("requiredAttribute") && reader.ReadAttributeValue())
                            yield return reader.Value;
                    }
                }
            }
        }
    }
}
Tergiver
That doesn't work as it attempts to read in the results element, which doesn't work if it hasn't been closed and doesn't allow the reader to move on to the entry elements, or from my tests it doesn't seem to anyway, I may be doing something wrong
ChoccyButton
Sure it does. Look at the edited version.
Tergiver
Ok, must be something wrong with my code then. I've tried what your suggesting above and the second if is never hit but it goes into the first if, that reads the full results element, so the reader never gets to the entry elements
ChoccyButton
Run the program I added above on your sample XML file. Edit the sample file to remove the closing </results> tag to simulate an incomplete stream. You should see that it works just fine. So yes, there is something you're doing wrong. Maybe you could provide something more than, "It doesn't work."
Tergiver
Then I looked at your code again. You are attempting to use XElement to read the XML. You cannot mix and match them as XElement will read the entire stream as Richard said. You have to fully parse using the XmlReader.
Tergiver