tags:

views:

119

answers:

1

assuming the following html (minus the comments and "nbsp;" etc that xQuery wont process as is) why does this following code work

for $first in fn:doc("file:///index.html")//element()[local-name() = "head"]
    return <test>{ $first }</test>

and this

for $first in fn:doc("file:///index.html")//head
return 
<test>{ $first }</test>

not work?

+3  A: 

Because index.html is XHTML and the <head> you are looking for is in the XHTML namespace.

The first query ignores namespaces because you use the local-name() function.

The second query does not, it explicitly asks for a <head> that is in the empty namespace.

You would need

declare namespace x="http://www.w3.org/1999/xhtml"

for $first in fn:doc("file:///index.html")/x:html/x:head
return <test>{ $first }</test>

Note that I avoid using //, since this goes through the entire tree of the document, even though in this case the only possible position of the <head> is known beforehand. Making it explicit speeds up the XPath query a lot.

Tomalak
thanks! thats on the chord of what I expected to hear, ... I only used the // to reduce the chance that the xPath was returning an empty-sequence() in the test
jtzero
Good. ;-) I'm just into the habit of cautioning people about the careless use `descendant` axis. Many do it out of laziness without recognizing the consequences.
Tomalak
I keep seeing this advice about not using '//', but this is not always the case. In the .NET XPathNavigator each node contains a pointer to the next node in the document with the same name. Finding the first node does a scan through the document, but after that navigation is fast. This means that /a/b/c/d can be a lot slower than //d as the implementation has to find a, b, and c nodes on the way. I would presume that other implementations also have similar performance characteristics - in a database //d could require one index lookup, but /a/b/c/d require 4 (and 3 parent-child joins).
Oliver Hallam
@Oliver: `/a` is the document node - O(1). `/a/b` implicitly excludes all subtrees that are not under a `b` node, looking up `b` is O(n) at most, probably faster. `/a/b/c/d` means a node on the fourth level. There are many possibilities for optimization here, but `//d` is hardly optimizable. Even with your linked-list argument you must be lucky to hit a `d` node early or the expression will do a lot of useless tree traversal.
Tomalak
@Tomalak: Consider the difference between `/a/b` and `//b` (assuming these match the same nodes of course, and assuming the linked-list indexing described). If the first `a` node is near the bottom of the document then yes, `/a/b` is faster. If however your document consists of lots of m `a` nodes with `b` nodes as there nth child. `/a/b` requires scanning mn nodes (n nodes are scanned for each `a`). `/b` requires scanning just n+1 nodes (to find the first `b`). Of course how the document is indexed is implementation-dependent. As always the only way to say which is better is to time them.
Oliver Hallam
@Oliver: There can only ever be one `/a` in the document. ;-) But I see the point in your argument, and I agree to it. I can imagine an implementation that builds a map of node names during DOM construction that rolls up XPath expressions from right to left, starting with `b` in an expression like `/a/b` and sorting out all that have no `a` parent. CSS works like that, see http://blog.twoalex.com/2010/02/26/a-shocking-truth-about-css/. In which case `//b` can be really fast. I would not currently consider this the norm, though.
Tomalak
@Tomalak: In fact this is exactly how xsl:match patterns work too. `/a/b/c` is really `self::c[parent::b[parent::a]]`. We have considered in the past (in XQuery) optimizing `//a/b` to `//b[parent::a]`.
Oliver Hallam
@Oliver: Thanks for taking the time to shed some light onto this, very much appreciated. :-)
Tomalak