ansaurus

Question

Parsing XML in C# XML for specific Content

Answer 1

+2 A:

The following article might be of use

http://www.java2s.com/Code/CSharp/XML/FindElementswithanXPathSearch.htm

Hatch 2010-08-18 14:44:56

This works but is not optimal for read-only access. The sample uses `XmlDocument` which builds up a complete DOM tree in memory that you usually won't need.

0xA3 2010-08-18 14:57:13

Answer 2

+7 A:

Use LINQ-to-XML:

var doc = XDocument.Parse(@"<Company>
    <Owner>Bob</Owner>
    <Contact>
        <address> -1 Infinite Loop </address>
        <phone>
            <LandLine>(000) 555-5555</LandLine>
            <Fax> (000) 555-5556 </Fax>
        </phone>
        <email> [email protected] </email>
    </Contact>
</Company>");

var phone = doc.Root.Element("Contact").Element("phone");

Console.WriteLine((string)phone.Element("LandLine"));
Console.WriteLine((string)phone.Element("Fax"));

Output:

(000) 555-5555
 (000) 555-5556

dtb 2010-08-18 14:45:51

Note that if Contact is missing, you'll get an exception on the `var phone = ...` line. I like to do things like `var contactNode = doc.Root.Element("Contact") ?? new XElement("Contact");` so I always have a node returned, and then when I do `var phone = contact.Element("phone") ?? new XElement("phone");` I won't get null object errors. And in the end, I just end up with blank values for the variables. Or use an xsd to validate the document prior to parsing to ensure the nodes you want exist.

Chad 2010-08-18 14:48:56

Note that the `XDocument` class also comes with the overhead of building up a DOM tree in memory; usually not what you need for read-only random access to nodes in the document, especially when you deal with large documents.

0xA3 2010-08-18 15:12:08

Answer 3

+1 A:

The best way to do that is to use XPath. Refer to this article, for reference: http://support.microsoft.com/kb/308333

and this article for how to do it: http://www.codeproject.com/KB/cpp/myXPath.aspx

icemanind 2010-08-18 14:46:10

Answer 4

+5 A:

The most light-weight approach for read-only access to specific nodes in an XML document is by using an XPathDocument together with an XPath expression:

XPathDocument xdoc = new XPathDocument(@"C:\sample\document.xml");
XPathNavigator node = xdoc.CreateNavigator()
    .SelectSingleNode("/Company/Contact/phone/LandLine");
if (node != null)
{
    string landline = node.Value;
}

0xA3 2010-08-18 14:53:27

Answer 5

+2 A:

I don't think you're too far off. There are more convenient methods (lots of different approaches). Assuming you want to take the same basic approach as you do here (and it is an efficient if verbose one), I'd do:

bool inPhone = false;
string landLine = null;
string fax = null;

using(xml = XmlReader.Create(websiteResultStream, xmlSettings)
while(xml.Read())
{
  switch(xml.NodeType)
  {
    case XmlNodeType.Element:
      switch(xml.LocalName)
      {
        case "phone":
          inPhone = true;
          break;
        case "LandLine":
          if(inPhone)
          {
            landLine = xml.ReadElementContentAsString();
            if(fax != null)
            {
              DoWhatWeWantToDoWithTheseValues(landline, fax);
              return;
            }
          }
          break;
        case "Fax":
          if(inPhone)
          {
            fax = xml.ReadElementContentAsString();
            if(landLine != null)
            {
              DoWhatWeWantToDoWithTheseValues(landline, fax);
              return;
            }
          }
          break;
      }
      break;
    case XmlNodeType.EndElement:
      if(xml.LocalName == "phone")
        inPhone = false;
      break;
  }
}

Note that this tracks whether it's "inside" a Phone element where that which you have would re-examine a LandLine inside a later element, which you seem to be trying to avoid.

Note also that we clean up the XmlReader, and do so by returning as soon as we have all the information we want.

Jon Hanna 2010-08-18 14:58:40

ansaurus

tags:

views:

answers:

Parsing XML in C# XML for specific Content

related questions