tags:

views:

3473

answers:

3

I'm trying to read the following Xml document as fast as I can and let additional classes manage the reading of each sub block.

<ApplicationPool>
    <Accounts>
        <Account>
            <NameOfKin></NameOfKin>
            <StatementsAvailable>
                <Statement></Statement>
            </StatementsAvailable>
        </Account>
    </Accounts>
</ApplicationPool>

However, I'm trying to use the XmlReader object to read each Account and subsequently the "StatementsAvailable". Do you suggest using XmlReader.Read and check each element and handle it?

I've thought of seperating my classes to handle each node properly. So theres an AccountBase class that accepts a XmlReader instance that reads the NameOfKin and several other properties about the account. Then I was wanting to interate through the Statements and let another class fill itself out about the Statement (and subsequently add it to an IList).

Thus far I have the "per class" part done by doing XmlReader.ReadElementString() but I can't workout how to tell the pointer to move to the StatementsAvailable element and let me iterate through them and let another class read each of those proeprties.

Sounds easy!

A: 

For sub-objects, ReadSubtree() gives you an xml-reader limited to the sub-objects, but I really think that you are doing this the hard way. Unless you have very specific requirements for handling unusual / unpredicatable xml, use XmlSerializer (perhaps coupled with sgen.exe if you really want).

XmlReader is... tricky. Contrast to:

using System;
using System.Collections.Generic;
using System.Xml.Serialization;
public class ApplicationPool {
    private readonly List<Account> accounts = new List<Account>();
    public List<Account> Accounts {get{return accounts;}}
}
public class Account {
    public string NameOfKin {get;set;}
    private readonly List<Statement> statements = new List<Statement>();
    public List<Statement> StatementsAvailable {get{return statements;}}
}
public class Statement {}
static class Program {
    static void Main() {
        XmlSerializer ser = new XmlSerializer(typeof(ApplicationPool));
        ser.Serialize(Console.Out, new ApplicationPool {
            Accounts = { new Account { NameOfKin = "Fred",
                StatementsAvailable = { new Statement {}, new Statement {}}}}
        });
    }
}
Marc Gravell
+8  A: 

My experience of XmlReader is that it's very easy to accidentally read too much. I know you've said you want to read it as quickly as possible, but have you tried using a DOM model instead? I've found that LINQ to XML makes XML work much much easier.

If your document is particularly huge, you can combine XmlReader and LINQ to XML by creating an XElement from an XmlReader for each of your "outer" elements in a streaming manner: this lets you do most of the conversion work in LINQ to XML, but still only need a small portion of the document in memory at any one time. Here's some sample code (adapted slightly from this blog post):

static IEnumerable<XElement> SimpleStreamAxis(string inputUrl,
                                              string elementName)
{
  using (XmlReader reader = XmlReader.Create(inputUrl))
  {
    reader.MoveToContent();
    while (reader.Read())
    {
      if (reader.NodeType == XmlNodeType.Element)
      {
        if (reader.Name == elementName)
        {
          XElement el = XNode.ReadFrom(reader) as XElement;
          if (el != null)
          {
            yield return el;
          }
        }
      }
    }
  }
}

I've used this to convert the StackOverflow user data (which is enormous) into another format before - it works very well.

Jon Skeet
A: 

We do this kind of XML parsing all the time. The key is defining where the parsing method will leave the reader on exit. If you always leave the reader on the next element following the element that was first read then you can safely and predictably read in the XML stream. So if the reader is currently indexing the <Account> element, after parsing the reader will index the </Accounts> closing tag.

The parsing code looks something like this:

public class Account
{
    string _accountId;
    string _nameOfKin;
    Statements _statmentsAvailable;

    public void ReadFromXml( XmlReader reader )
    {
        reader.MovToContent();

        // Read node attributes
        _accountId = reader.GetAttribute( "accountId" );
        ...

        if( reader.IsEmptyElement ) { reader.Read(); return; }

        reader.Read();
        while( ! reader.EOF )
        {
            if( reader.IsStartElement() )
            {
                switch( reader.Name )
                {
                    // Read element for a property of this class
                    case "NameOfKin":
                        _nameOfKin = reader.ReadElementContentAsString();
                        break;

                    // Starting sub-list
                case "StatementsAvailable":
                    _statementsAvailable = new Statements();
                    _statementsAvailable.Read( reader );
                    break;

                    default:
                        reader.Skip();
                }
            }
            else
            {
                reader.Read();
                break;
            }
        }       
    }
}

The Statements class just reads in the <StatementsAvailable> node

public class Statements
{
    List<Statement> _statements = new List<Statement>();

    public void ReadFromXml( XmlReader reader )
    {
        reader.MoveToContent();
        if( reader.IsEmptyElement ) { reader.Read(); return; }

        reader.Read();
        while( ! reader.EOF )
        {
            if( reader.IsStartElement() )
            {
                if( reader.Name == "Statement" )
                {
                    var statement = new Statement();
                    statement.ReadFromXml( reader );
                    _statements.Add( statement );               
                }
                else
                {
                    reader.Skip();
                }
            }
            else
            {
                reader.Read();
                break;
            }
        }
    }
}

The Statement class would look very much the same

public class Statement
{
    string _satementId;

    public void ReadFromXml( XmlReader reader )
    {
        reader.MovToContent();

        // Read noe attributes
        _statementId = reader.GetAttribute( "statementId" );
        ...

        if( reader.IsEmptyElement ) { reader.Read(); return; }

        reader.Read();
        while( ! reader.EOF )
        {           
            ....same basic loop
        }       
    }
}
Paul Alexander