views:

91

answers:

2

I have a big (~40mb) collection of XML data, split in many files which are not well formed, so i merge them, add a root node and load all the xml in a XmlDocument. Its basically a list of 3 different types which can be nested in a few different ways. This example should show most of the cases:

<Root>
  <A>
    <A>
      <A></A>
      <A></A>
    </A>
  </A>
  <A />
  <B>
    <A>
      <A>
        <A></A>
        <A></A>
      </A>
    </A>
  </B>
  <C />
</Root>

Im separating all A, B and C nodes by using XPath expressions on a XmlDocument (//A, //B, //C), convert the resulting nodesets to a datatable and show a list of all nodes of each nodetype separately in a Datagridview. This works fine.

But now Im facing an even bigger file and as soon as i load it, it shows me only 4 rows. Then i added a breakpoint at the line where the actual XmlDocument.SelectNodes happens and checked the resulting NodeSet. It shows me about 25,000 entries. After continuing the program loaded and whoops, all my 25k rows were shown. I tried it again and i can reproduce it. If i step over XmlDocument.SelectNodes by hand, it works. If i dont break there, it does not. Im not spawning a single thread in my application.

How can i debug this any further? What to look for? I have experienced such behaviour with multithreaded libraries such as jsch (ssh) but im dont see why this should happen in my case.

Thank you very much!

// class XmlToDataTable:
private DataTable CreateTable(NamedXPath logType,
                              List<XmlColumn> columns,
                              ITableCreator tableCreator)
{
    // I have to break here -->
    XmlNodeList xmlNodeList = logFile.GetEntries(logType);
    // <-- I have to break here

    DataTable dataTable = tableCreator.CreateTableLayout(columns);
    foreach (XmlNode xmlNode in xmlNodeList)
    {
        DataRow row = dataTable.NewRow();
        tableCreator.PopulateRow(xmlNode, row, columns);
        dataTable.Rows.Add(row);
    }
    return dataTable;
}

// class Logfile:
public XmlNodeList GetEntries(NamedXPath e)
{
    return (_xmlDocument != null && _xmlDocument.HasChildNodes)
                         ? _xmlDocument.SelectNodes(e.XPath)
                         : new XmlNullObjectNodeList();
}
// _xmlDocument gets loaded here after reading all xml fragments into a string
// (ugly, i know. the  // ugly! comment reminds me about that ;))
private void CreateXmlDoc()
{
    _xmlDocument = new XmlDocument();
    _xmlDocument.LoadXml(OPEN_ROOT_ELEMENT + _xmlString +
                             CLOSE_ROOT_ELEMENT);
    if (DataChanged != null)
        DataChanged(this, new EventArgs());
}

// class NamedXPath:
public abstract class NamedXPath
{
    private readonly String _name;
    private readonly String _xPath;
    protected NamedXPath(string name, string xPath)
    {
        _name = name;
        _xPath = xPath;
    }

    public string Name
    {
        get { return _name; }
    }

    public string XPath
    {
        get { return _xPath; }
    }
}
A: 

Instead of using XPath directly in the code first, I would use a tool such as sketchPath to get my XPath right. You can either load your original XML or use subset of original XML.

Play with XPath and your XML to see if the expected nodes are getting selected before using xpath in your code.

Anunay
There are no fancy xpath expressions here, //A selects all nodes of type A in the document. I know that this is right and i checked it again twice with xpath visualizer. Also, it works if i break at the right position.
atamanroman
A: 

Okay, solved it. tableCreator is part of my strategy pattern, which influences the way the table is built. In a certain implementation I do something like this:

XmlNode xn = xmlDocument.SelectSingleNode(fancyXPath);
// if a node has ancestors, then its a linked list:
// <a><a><a></a></a></a>
if(xn.SelectSingleNode("a") != null)
    xn.SelectSingleNode("a").InnerText = "<IDs of linked list items CSV like here>";

Which means im replacing parts of a xml linked list with some text and lose the nested items there. Wouldn't be a problem to find this bug if this change wouldn't affect the original XmlDocument. Even then, debugging it should not be too hard. What makes my program behaving differently depending whether I break or not seems to be the following:

Return Value: The first XmlNode that matches the XPath query or null if no matching node is found. The XmlNode should not be expected to be connected "live" to the XML document. That is, changes that appear in the XML document may not appear in the XmlNode, and vice versa. (API Description of XmlNode.SelectNodes())

If I break there, the changes are written back to the original XmlDocument, if I don't break, its not written back. Can't really explain that to myself, but without the change in the XmlNode everything works.

edit: Now im quite sure: I had XmlNodeList.Count in my watches. This means, everytime i debugged, VS called the property Count, which not only returns a number but calls ReadUntil(int), which refreshes the internal list:

internal int ReadUntil(int index)
{
    int count = this.list.Count;
    while (!this.done && (count <= index))
    {
        if (this.nodeIterator.MoveNext())
        {
            XmlNode item = this.GetNode(this.nodeIterator.Current);
            if (item != null)
            {
                this.list.Add(item);
                count++;
            }
        }
        else
        {
            this.done = true;
            return count;
        }
    }
    return count;
}

This may have caused that weird behavior.

atamanroman
What's probably happening is some kind of lazy evaluation in `SelectNodes`. If you break to the debugger and examine `xmlNodeList`, the debugger evaluates the query and builds the list of nodes, and when you continue, the fact that you've modified the XML document doesn't affect the list, as it's already been built. If there's lazy evaluation going on and the list isn't being built at the moment `SelectNodes` executes but rather as it's being enumerated, changes to the document will change the list. That's a wild-ass guess, though.
Robert Rossney