tags:

views:

94

answers:

4

I have a string that contains well formed xml in it. I want to navigate the XML in that string to extract the text in certain nodes. How do I efficiently accomplish this using a built-in .NET class. Which .NET XML class would you use and why?

Many thanks for your help.

Note 1: Linq is not available to me. Note 2: Editing the XML is not important. Read-only access is what I need.

+3  A: 

I would use XmlDocument.Load() to get a DOM from the string. Then you can traverse it using the appropriate DOM methods or XPATH as needed.

jeffamaphone
+2  A: 

It depends on the structure of XML. If it is relatively simple, then the most efficient way to wrap the string into StringReader, and then wrap that into XmlReader. The benefit is that you won't have to create an XML tree in memory, copying data from the string - you'll just read nodes one by one.

If the document structure is complicated enough, you might need (or want) a DOM - in which case XDocument.Parse should do the trick.

Pavel Minaev
Your sound like a good idea but I don't have access to Linq.
Newbie
I dont see the point in wrapping it inside a StringReader. As far as I know, the only purpose of StringReader is to have mutable strings such that when a lot of string operations are performed. The overhead of creating new strings for every string manipulation is gone with StringReader. I dont see how that fits here?
Henri
@Henry: You're confusing `StringWriter` with `StringReader`. `StringReader` is used here to provide `TextReader` interface on top of a plain string, because `XmlReader.Create` expects `TextReader`.
Pavel Minaev
@Newbite: if you don't have LINQ, then pick either `XmlDocument` or `XPathDocument` depending on your requirements. If you're only going to read data, `XPathDocument` would likely be better.
Pavel Minaev
+3  A: 

For navigation? Probably XPathDocument:

string s = @"<xml/>";
XPathDocument doc = new XPathDocument(new StringReader(s));

From MSDN,

Provides a fast, read-only, in-memory representation of an XML document by using the XPath data model.

Unlike XmlDocument etc, it is optimised for readonly usage; more efficient but less powerful (i.e. you can't edit it). For notes on how to query it, see here.

Marc Gravell
It should be noted that `XPathDocument` is actually _significantly_ faster on some kinds of XPath queries; notably anything involving `preceding` or `preceding-sibling` axis.
Pavel Minaev
+3  A: 

For speed, use an XmlReader:

using (StringReader sr = new StringReader(myString))
using (XmlReader xr = XmlReader.Create(sr))
{
    while (xr.Read())
    {
        if (xr.NodeType == XmlNodeType.Element && xr.Name == "foo")
        {
            Console.WriteLine(xr.ReadString());
        }
    }
}

The above prints out the text content of every element named "foo" in the XML document. (Well, sort of. ReadString doesn't handle nested elements very gracefully.)

Using an XPathDocument is slower, because the entire document gets parsed before you can start searching it, but it has the merit of simplicity:

using (StringReader sr = new StringReader(myString))
{
    XPathDocument d = new XPathDocument(sr);
    foreach (XPathNavigator n in d.CreateNavigator().Select("//foo/text()"))
    {
        Console.WriteLine(n.Value);
    }
}

If you're not concerned with performance or memory utilization, it's simplest to use an XmlDocument:

XmlDocument d = new XmlDocument();
d.LoadXml(myString);
foreach (XmlNode n in d.SelectNodes("//foo/text()"))
{
   Console.WriteLine(n.Value);
}
Robert Rossney
Nice answer with code samples. As a side note, `XPathNodeIterator` implements `IEnumerable`, so there's no need to use `while` - `foreach` will do the trick as well, and is easier to read.
Pavel Minaev
Right you are; I edited the example to show that.
Robert Rossney