views:

82

answers:

1

I am writing a class library which abstracts data contained in XML files on a web site. Each XML file uses the same root element: page. The descendant(s) of page depend on the specific file that I am downloading. For example:

<!-- http://.../groups.xml -->
<page url="/groups.xml">
  <groups>
    <group id="1" >
      <members>
        <member name="Adrian" />
        <member name="Sophie" />
        <member name="Roger" />
      </members>
    </group>
  </groups>
</page>

<!-- http://.../project.xml?n=World%20Domination -->
<page url="/project.xml">
  <projectInfo>
    <summary classified="true" deadline="soon" />
    <team>
      <member name="Pat" />
      <member name="George" />
    </team>
  </projectInfo>
</page>

There are also several additional XML files that I would like to download and process, eventually. For that reason, I have been trying to come up with a nice, clean way to deserialize the data. I've tried a few approaches, but each approach leaves me feeling a little dirty when I look back over my code. My latest incarnation utilizes the following method:

internal class Provider
{
    /// <summary>Download information from the host.</summary>
    /// <typeparam name="T">The type of data being downloaded.</typeparam>
    internal T Download<T>(string url) where T : IXmlSerializable, new()
    {
        try
        {
            var request = (HttpWebRequest)WebRequest.Create(url);
            var response = (HttpWebResponse)request.GetResponse();
            using (var reader = XmlReader.Create(response.GetResponseStream()))
            {
                // Skip the XML prolog (declaration and stylesheet reference).
                reader.MoveToContent();

                // Skip the `page` element.
                if (reader.LocalName == "page") reader.ReadStartElement();

                var serializer = new XmlSerializer(typeof(T));
                return (T)serializer.Deserialize(reader);
            }
        }
        catch (WebException ex) { /* The tubes are clogged. */ }
    }
}

[XmlRoot(TypeName = "groups")]
public class GroupList : List<Group>, IXmlSerializable
{
    private List<Group> _list;

    public void ReadXml(XmlReader reader)
    {
        if (_list == null) _list = new List<Group>();

        reader.ReadToDescendant("group");

        do
        {
            var id = (int)reader["id"];
            var group = new Group(id);

            if (reader.ReadToDescendant("member"))
            {
                do
                {
                    var member = new Member(reader["name"], group);
                    group.Add(member);
                } while (reader.ReadToNextSibling("member"));
            }

            _list.Add(group);
        } while (reader.ReadToNextSibling("group"));

        reader.Read();
    }
}

This works, but I feel like there is a better way that I'm not seeing. I tried using the xsd.exe utility when I started this project. While it would minimize the amount of code for me to write, it did not feel like the ideal solution. It would be the same approach that I'm using now -- I would just get there faster. I'm looking for a better solution. All pages have the page element in common -- isn't there a way to take advantage of that? Would it be possible to have a serializable container class Page that could contain a combination of other objects depending on the file downloaded? Are there any simpler ways to accomplish this?

+4  A: 

.NET provides a "xsd.exe" utility on the command line.

Run xsd.exe (xmlfilename) on your original xml file and it'll derive a XML schema (xsd) from your XML data file.

Run xsd.exe (xsd file name) /C and it'll create a C# class which can be used to deserialize such an XML file into a C# class.

Of course, since it only has a single XML file to go on, xsd.exe isn't perfect in its XML schema it derives - but that could be quick'n'easy starting point for you to get started.

marc_s
I tried using that utility when I started this project. While it *would* minimize the amount of code for me to write, it did not feel like the ideal solution. It would be the same approach that I'm using now -- I would just get there faster. I was hoping for a smarter solution.
Justin R.
What about it didn't feel "ideal" ? How should a solution be "smarter"? Given an XML, you have a defined structure - how smart can you need to be to parse that? That's what deserialization is good at.
marc_s