views:

955

answers:

4

I have a simple structured XML file like this:

<ttest ID="ttest00001", NickName="map00001"/>
<ttest ID="ttest00002", NickName="map00002"/>
<ttest ID="ttest00003", NickName="map00003"/>
<ttest ID="ttest00004", NickName="map00004"/>

..... This xml file can be around 2.5MB.

In my source code I will have a loop to get nicknames

In each loop, I have something like this:

nickNameLoopNum = MyXmlDoc.SelectSingleNode("//ttest[@ID=' + testloopNum + "']").Attributes["NickName"].Value

This single line will cost me 30 to 40 millisecond.

I searched some old articles (dated back to 2002) saying, use some sort of compiled "xpath" can help the situation, but that was 5 years ago. I wonder is there a mordern practice to make it faster? (I'm using .NET 3.5)

A: 

In this case you might want to consider reading the nicknames in the XML file into an array (if your test IDs are really just sequential integers) or a dictionary (if not) up front, then using that to locate each nickname, rather than trying to do a bunch of XPath queries. You would probably get much better performance on lookups that way.

Edit: Something like this (pseudo-code)

var nicknames = new Dictionary<string, string>();

foreach (XmlNode node in MyXmlDoc.ChildNodes)
{
    if (node is XmlElement)
    {
        nicknames.Add(node.Attributes["ID"], node.Attributes["NickName"]);
    }
}

...

nickNameLoopNum = nicknames[testLoopNum];
Eric Rosenberger
Many thanks Eric. Your code helped me.
+2  A: 

You're using XPath already ("//ttest..."), and it's the slowest way to access the doc nodes as the "//" syntax looks across the entire doc.

try something like...

foreach (XMLNode node in MyXmlDoc.ChildNodes) {
    ...
}

instead, no xpath required and it should be quicker. (implicit assumption that it's a 'flat' xml file with no nesting. If so, you'll be recursing soon my lad).

marcus.greasly
Thanks a lot. this shed the light for me. cool
+1  A: 

Using the "//" abbreviation in an XPath expression results in big inefficiency as it causes the whole XML document to be searched. Using '//' repeatedly multiplies this inefficiency.

One efficient solution to the problem is to obtain all "NickName" attribute nodes by evaluating just one single XPath expression:

   ttest/@NickName

where the context node is the parent of all "ttest" elements.

The C# code will look like the following:

    int n = 15;
    XmlDocument doc = new XmlDocument();
    doc.Load("MyFile.xml");

    XmlNodeList nodeList;
    XmlNode top = doc.DocumentElement;
    nodeList =
        top.SelectNodes("ttest/@NickName");

    // Get the N-th NickName, can be done in a loop for
    // all n in a range

    string nickName = nodeList[n].Value;

Here we suppose that the "ttest" elements are children of the top element of the xml document.

To summarize, an efficient solution is presented, which evaluates an XPath expression only once and places all results in a convenient IEnumerable object (that can be used as an array) to access any required item in O(c) time.

Dimitre Novatchev
check my answer and test the code yourself. You will be surprised.
argatxa
@argatha: Nowhere I said that selecting the attributes immediately is faster, just that it is faster to select only the necessary nodes vs. scan the whole tree. If the whole tree consists only of the nodes we want to select, then we can choose either way. However, there are many cases in practice where the XML document contains many big subtrees that don't contain any of our nodes at all. Test these cases and be fair.
Dimitre Novatchev
A: 

In answer to Dimitre

Actually... selecting the whole node is quicker than selecting just the attributes.

I have a unit test benchmarking the code below and (amazingly)selecting full node and processing the attribute is quicker than selecting the attributes and getting the value straight away.

put this in a 10000 iterations loop and swap comments to test each way.

 //XmlNodeList nodeList = document.SelectNodes("test/@NickName");
            XmlNodeList nodeList = document.SelectNodes("test");
            foreach (XmlNode node in nodeList)
            {
                //string nickName = node.Value;
                string nickName = ((XmlAttribute)node.Attributes.GetNamedItem("NickName")).Value;

            }

Counterintuitive I know, but.... you have to measure!!

argatxa