views:

1421

answers:

3

Hello.

I need to parse a xml file which is practically an image of a really big tree structure, so I'm using the XmlReader class to populate the tree 'on the fly'. Each node is passed just the xml chunk it expects from its parent via the ReadSubtree() function. This has the advantage of not having to worry about when a node has consumed all its children. But now I'm wondering if this is actually a good idea, since there could be thousands of nodes and while reading the .NET source files I've found that a couple (and probably more) new objects are created with every ReadSubtree call, and no caching for reusable objects is made (that I'd seen).

Maybe ReadSubtree() was not thought to be massively used, or maybe I'm just worrying for nothing and I just need to call GC.Collect() after parsing the file...

Hope someone can shed some light on this.

Thanks in advance.

+7  A: 

ReadSubTree() gives you an XmlReader that wraps the original XmlReader. This new reader appears to consumers as a complete document. This might be important if the code you pass the subtree to thinks it is getting a standalone xml document. For example the Depth property of the new Reader starts out at 0. It is a pretty thin wrapper, so you won't be using any more resources than you would if you used the original XmlReader directly, In the example you gave, it is rather likely that you aren't really getting much out of the subtree reader.

The big advantage in your case would be that the subtree reader can't accidentally read past the subtree. Since the subtree reader isn't very expensive, that safety might be enough, though it is generally more helpful when you need the subtree to look like a document or you don't trust the code to only read its own subtree.

As Will noted, you never want to call GC.Collect(). It will never improve performance.

Stefan Rusek
+1  A: 

Making the assumption that all objects are created on the normal managed heap, and not the large object heap (ie less than 85k), there really should be no problem here, this is just what the GC was designed to deal with.

I would suggest that there is also no need to call GC.Collect at the end of the process, as in almost all cases allowing the GC to schedule collections itself allows it to work in the optimal manner (see this blog post for a very detailed explanation of GC which explains this much better than I can).

Chris Ballard
A: 

Thanks for the nice and insightful answers.

I had a deeper look at the .NET source code and I found it to be more complex than I first imagined. I've finally abandoned the idea of calling this function in this very scenario. As Stefan pointed out, the xml reader is never passed to outsiders and I can trust the code that parses the xml stream, (which is written by myself), so I'd rather force each node to be responsible for the amount of data they steal from the stream than using the not-so-thin-in-the-end ReadSubtree() function to just save a few lines of code.

Trap