views:

455

answers:

2

So we have an XML file with a very simple implementation of XLink:

<root xmlns:xlink="http://www.w3.org/1999/xlink"&gt;
  <firstChild id="ID1" />
  ...
  <ref xlink:href="#ID1" />
</root>

Let's assume the XLink implementation won't get any more complicated than that. However the important point is that the element referred to (in this case firstChild) could appear anywhere in the document, anywhere in the hierarchy.

In an XPath lookup you could find the element referred to by the node by using an expression like:

//*[@id='ID1']

What's the best equivalent using Linq to XML? I'd have thought something along these lines:

XDocument doc = XDocument.Load("file.xml");
var dest = xDoc.Descendants().Where(e => (string)e.Attribute("id") == "ID1").SingleOrDefault();

Not actually tested yet. But in general terms, if the XML document is quite large, is the LINQ way going to be inefficient (since it's using an enumeration of all descendants on the XDocument)? Would it be better to revert to an XPathNavigator and just use the XPath expression?

If this kind of thing is okay to do in LINQ, is there a better method than what I wrote? LINQ is still only a few days old to me... it's awesome, but I wonder if it has efficiency limitations for certain operations.

Thanks!

+3  A: 

XPathNavigator isn't going to be any more efficient here, because it will still have to enumerate all descendants to find them - there's no magic dust there. If you want it to be more efficient than that, you'll need an index, and no built-in XML API provides them out of the box, so you'll have to roll out your own. For example:

XDocument doc = ...;
var id2elem = (from e in doc.Descendants()
               let id = e.Attribute("id")
               where id != null
               select new { Id = id.Value, Element = e })
              .ToDictionary(kv => kv.Id, kv => kv.Element);

and then use that dictionary to lookup nodes by ID whenever you need to. Obviously, this is only worthwhile if lookups are relatively frequent, and not if you just need to do it once or twice.

Pavel Minaev
Exactly what I was looking for :)
Rashmi Pandit
+1  A: 

If I were you I will do in the same way. Except that in the where clause:

EDIT

from e in xDoc.Descendants()
let id = e.Attribute("id") ? e.Attribute("id").Value : null
where (id == "ID")
select e

So that no need to do the type casting.

Regarding to your first question, as far as I know Ms stopped the development on XPath implementation, so that I believe, even not at the moment, the Linq 2 XML would have a much better optimisation compare to XPath.

xandy
This will throw `NullReferenceException` if some node in the document won't have `@id`
Pavel Minaev
Also, there is no "optimization" in either XPath (as implemented in XPathNavigator) or LINQ! It's not magic. It works precisely as you'd expect it to, which is linear scans everywhere. It can't use indices, because how would it know what you're going to be looking up, and what to index? If anything, XPathNavigator may actually be faster for some types of queries because its nodes are double-linked, and XLINQ ones are single-linked (so getting preceding sibling in XPathDocument is O(1), but in XLINQ it's O(N)).
Pavel Minaev
Optimisation means the way how MS implement the codes. It's the same as how every browser implements the JS engine, and it come up currently Safari, Chrome top the speed and IE comes last. They all tested using same JS Code and the only difference is how the browser's execution strategy and algorithms within. The optimisation I said is the same thing, MS will put effort in making LINQ run fast(er).
xandy
How exactly do you think it would be possible to make it run faster? Remember, there's still no magic. Any speed increase would have to come from better data structures and/or algorithms. What better ones you know that would speed it up?In practice, the only benefit XLINQ has is that you can use it with Parallel LINQ.
Pavel Minaev
By the way, your edited code is both longer than his code, and doesn't really buy anything otherwise. You say you want to avoid the type casting, but why? It uses `XAttribute` overloaded conversion operator to string, which is going to do exactly the same thing as what your code does, but shorter.
Pavel Minaev
@Pavel. I agree the edited code may not good. But back to the magic, yes, there's not magic, but I just try to refer to what Gavin ask, about the choice of xpath and linq. Currently they have very similar underlying implementation, and the use of xpath and the linq to search for the id makes no big difference in terms of algorithm. Both of them should be O(# of nodes). What I do point out is, even the same algorithm you could have, even not much, possibly slightly execution strategy. And what MS's custom is, they will do the best to what they want to support.
xandy
+1 for the instructive debate in the comments. I'll keep my code but agree with xandy that, although we all agree there's no magic, there's still potential for differences in the implementation; though it seems such differences would be too insignificant to worry much about. Thanks to you both.
Gavin Schultz-Ohkubo