tags:

views:

73

answers:

5

I am trying to parse an XML response from a website in C#. The response comes in a format similar to the following:

<Company>
    <Owner>Bob</Owner>
    <Contact>
        <address> -1 Infinite Loop </address>
        <phone>
            <LandLine>(000) 555-5555</LandLine>
            <Fax> (000) 555-5556 </Fax>
        </phone>
        <email> [email protected] </email>
    </Contact>
</Company>

The only information I want is the LandLine and Fax numbers. However my current approach seems really really poor quality. Essentially it is a bunch of nested while loops and checks to the Element name then reading the Content when I found the right Element. I am using something like the listing below:

XmlReader xml = XmlReader.Create(websiteResultStream, xmlSettings);

while(xml.Read()){
    if(xml.NodeType == XmlNodeType.Element){
        if(xml.Name.ToString() == "Phone"){
            while(xml.Read()) {
                if(xml.NodeType == XmlNodeType.Element) {
                     if(xml.Name.ToString() == "LandLine"){
                          xml.MoveToContent();
                          xml.ReadContentAsString();
                     }
                     if(xml.Name.ToString() == "Fax"){
                          xml.MoveToContent();
                          xml.ReadContentAsString();
                     }
                }
            }
        }
    }
}

I am newer to XML/C#, but the above method just screams bad code! I want to ensure that if the structure changes (i.e. there are addition phone number types like "mobile") that the code is robust (hence the additional while loops)

Note: the above C# code is not exact, and lacks some checks etc, but it demonstrates my current abysmal disgusting approach

What is the best/cleanest way to simply extract the content from those two Elements if they are present?

+2  A: 

The following article might be of use

http://www.java2s.com/Code/CSharp/XML/FindElementswithanXPathSearch.htm

Hatch
This works but is not optimal for read-only access. The sample uses `XmlDocument` which builds up a complete DOM tree in memory that you usually won't need.
0xA3
+7  A: 

Use LINQ-to-XML:

var doc = XDocument.Parse(@"<Company>
    <Owner>Bob</Owner>
    <Contact>
        <address> -1 Infinite Loop </address>
        <phone>
            <LandLine>(000) 555-5555</LandLine>
            <Fax> (000) 555-5556 </Fax>
        </phone>
        <email> [email protected] </email>
    </Contact>
</Company>");

var phone = doc.Root.Element("Contact").Element("phone");

Console.WriteLine((string)phone.Element("LandLine"));
Console.WriteLine((string)phone.Element("Fax"));

Output:

(000) 555-5555
 (000) 555-5556
dtb
Note that if Contact is missing, you'll get an exception on the `var phone = ...` line. I like to do things like `var contactNode = doc.Root.Element("Contact") ?? new XElement("Contact");` so I always have a node returned, and then when I do `var phone = contact.Element("phone") ?? new XElement("phone");` I won't get null object errors. And in the end, I just end up with blank values for the variables. Or use an xsd to validate the document prior to parsing to ensure the nodes you want exist.
Chad
Note that the `XDocument` class also comes with the overhead of building up a DOM tree in memory; usually not what you need for read-only random access to nodes in the document, especially when you deal with large documents.
0xA3
+1  A: 

The best way to do that is to use XPath. Refer to this article, for reference: http://support.microsoft.com/kb/308333

and this article for how to do it: http://www.codeproject.com/KB/cpp/myXPath.aspx

icemanind
+5  A: 

The most light-weight approach for read-only access to specific nodes in an XML document is by using an XPathDocument together with an XPath expression:

XPathDocument xdoc = new XPathDocument(@"C:\sample\document.xml");
XPathNavigator node = xdoc.CreateNavigator()
    .SelectSingleNode("/Company/Contact/phone/LandLine");
if (node != null)
{
    string landline = node.Value;
}
0xA3
+2  A: 

I don't think you're too far off. There are more convenient methods (lots of different approaches). Assuming you want to take the same basic approach as you do here (and it is an efficient if verbose one), I'd do:

bool inPhone = false;
string landLine = null;
string fax = null;

using(xml = XmlReader.Create(websiteResultStream, xmlSettings)
while(xml.Read())
{
  switch(xml.NodeType)
  {
    case XmlNodeType.Element:
      switch(xml.LocalName)
      {
        case "phone":
          inPhone = true;
          break;
        case "LandLine":
          if(inPhone)
          {
            landLine = xml.ReadElementContentAsString();
            if(fax != null)
            {
              DoWhatWeWantToDoWithTheseValues(landline, fax);
              return;
            }
          }
          break;
        case "Fax":
          if(inPhone)
          {
            fax = xml.ReadElementContentAsString();
            if(landLine != null)
            {
              DoWhatWeWantToDoWithTheseValues(landline, fax);
              return;
            }
          }
          break;
      }
      break;
    case XmlNodeType.EndElement:
      if(xml.LocalName == "phone")
        inPhone = false;
      break;
  }
}

Note that this tracks whether it's "inside" a Phone element where that which you have would re-examine a LandLine inside a later element, which you seem to be trying to avoid.

Note also that we clean up the XmlReader, and do so by returning as soon as we have all the information we want.

Jon Hanna