views:

199

answers:

3

Hello

I would like to know how to stream over a very large, deeply nested, XML Document using LINQ, while streaming it, filter nodes based on some criteria and then write the streamed output to a file, while maintaining the same original structure of the XML.

This should happen without loading the entire document into memory.

Is this possible?

+1  A: 

LINQ to XML doesn't support reading in a streaming fashion directly, but I've had success in using an XmlReader, filtering based on that, and then passing it to XElement.Load when I've discovered the subtree I'm interested in. It assumes that the subtree is small enough to fit into memory. When Load returns, the reader will have been moved beyond that subtree, and you can keep going until you find the next relevant subtree, etc.

See this MSDN blog post for more information and sample code.

(This is what I did with the Stack Overflow data dump, btw :)

Jon Skeet
Hi JonFunny, I was just watching this video of you while this answer was posted:http://skillsmatter.com/podcast/open-source-dot-net/jon-skeet-talks-on-c-sharp-3-0With regards to the answer, wouldnt that result in an flat output? I know how to stream in the XML using XMLReader and feeding it to LINQ one by one, my problem is upon filtering I loose the XML structure on the output.
aattia
A: 

For XML streaming options, check out the XML Team's discussion of streaming with LINQ to XML starting with http://blogs.msdn.com/xmlteam/archive/2007/03/05/streaming-with-linq-to-xml-part-1.aspx. Realize that it is an early blog series and there were some implementation detail changes made in the final release.

Jim Wooley
Hi, I read that , the problem is that they do not go into depth about dealing with deeply nested structures.They do refer to this on a future PART 3 on the series, put I couldnt never find it, I can only find part 1 and 2.
aattia
A: 

This paper contains the answer to my question:

http://homepages.cwi.nl/~ralf/api-streaming-xml/

Specifically it shows how to maintain tree structure of an original XML when filtering the results while streaming.

aattia