views:

2766

answers:

4

I have parsed Xml using both of the following two methods...

  • Parsing the XmlDocument using the object model and XPath queries.
  • XSL/T

But I have never used...

  • The Linq Xml object model that was new to .Net 3.5

Can anyone tell me the comparative effeciantly between the three alternatives?

I realise that the particular usage would be a factor, but I just want a rought idea. For example, is the Linq option massivly slower than the others?

+1  A: 

I haven't actually tested it, but Linq is primarily a compiler code-gen type feature, and so it should be comparable to using an XmlDocument and XPath queries.

The primary value of Linq is that it provides compile-time verification of your query statements, which neither XPath nor XSLT can provide.

I would think that if performance is a concern, your decision would be based on the task at hand. For example, retrieving a single value from an XML document might be fastest using a single XPath query, but translating XML data into an HTML page would be faster using XSLT.

Jeff B
Linq is not a "compiler code-gen". It is a bunch of static methods against generic IEnumerables. http://msdn.microsoft.com/en-us/library/system.linq.enumerable_methods.aspxLinqToXml does not provide compile-time guarantees, because the xml structure is unknown at compile time. Access by string
David B
Microsofts stated purpose of Linq was to provide compile-time validation of the expression statement (for XPath, SQL, etc) as well as a single generic query syntax - not verification against the data source.
Jeff B
+1  A: 

LinqToXml queries work against the IEnumerable contract... most of its operations are O(N) because they require iteration over the IEnumerable.

If what you're starting with is a string containing xml, in order to work with it in Linq, you would need to parse it into the full object graph using XElement.Parse, then iterate over parts of it (to filter it, for example).

My understanding of XPath is that it will filter while parsing, which could be very advantageous from a performance standpoint. The full object graph need not be constructed.

David B
+23  A: 

The absolute fastest way to query an XML document is the hardest: write a method that uses an XmlReader to process the input stream, and have it process nodes as it reads them. This is the way to combine parsing and querying into a single operation. (Simply using XPath doesn't do this; both XmlDocument and XPathDocument parse the document in their Load methods.) This is usually only a good idea if you're processing extremely large streams of XML data.

All three methods you've describe perform similarly. XSLT has a lot of room to be the slowest of the lot, because it lets you combine the inefficiencies of XPath with the inefficiencies of template matching. XPath and LINQ queries both do essentially the same thing, which is linear searching through enumerable lists of XML nodes. I would expect LINQ to be marginally faster in practice because XPath is interpreted at runtime while LINQ is interpreted at compile-time.

But in general, how you write your query is going to have a much greater impact on execution speed than what technology you use.

The way to write fast queries against XML documents is the same whether you're using XPath or LINQ: formulate the query so that as few nodes as possible get visited during its execution. It doesn't matter which technology you use: a query that examines every node in the document is going to run a lot slower than one that examines only a small subset of them. Your ability to do that is more dependent on the structure of the XML than anything else: a document with a navigable hierarchy of elements is generally going to be a lot faster to query than one whose elements are all children of the document element.

Edit:

While I'm pretty sure I'm right that the absolute fastest way to query an XML is the hardest, the real fastest (and hardest) way doesn't use an XmlReader; it uses a state machine that directly processes characters from a stream. Like parsing XML with regular expressions, this is ordinarily a terrible idea. But it does give you the option of exchanging features for speed. By deciding not to handle those pieces of XML that you don't need for your application (e.g. namespace resolution, expansion of character entities, etc.) you can build something that will seek through a stream of characters faster than an XmlReader would. I can think of applications where this is even not a bad idea, though there I can't think of many.

Robert Rossney
See also http://stackoverflow.com/questions/407350/how-best-to-use-xpath-with-very-large-xml-files-in-c/716659#716659, in which I point the reader to XPathReader which combines the speed of XmlReader with the ease of use of XPath
Richard Wolf
XPathReader is a really outstanding idea of which I was completely unaware. Thanks for pointing me at it.
Robert Rossney
The availability of PLINQ (Parallel Linq) in .NET 4.0 makes Linq an even more compelling option than before. To be fair, PLINQ is really just throwing more horsepower at the problem; it's not making the parser any more efficient. But overall Linq strikes a nice balance between brevity and performance.
Steve Wortham
A: 

If you want really fast XML processing (reading) you should consider using XmlReader unfortunately implementation is bit hard.

There is also a way to implement LINQ solution with a combination of XmlReader so you can have the ease of use of LINQ. Also you can get much better performance than XmlDocument/XPath.

Please refer to following link for more information on this. http://blogs.msdn.com/xmlteam/archive/2007/03/24/streaming-with-linq-to-xml-part-2.aspx

Also I think if you only work with small XML files using of XmlDocument/XPath won't be a performance issue.