How would you find all nodes between two H3's using XPATH?
In XPath 1.0 one way to do this is by using the Kayessian method for node-set intersection:
$ns1[count(.|$ns2) = count($ns2)]
The above expression selects exactly the nodes that are part both of the node-set $ns1
and the node-set $ns2
.
To apply this to the specific question -- let's say we need to select all nodes between the 2nd and 3rd h3
element in the following XML document:
<html>
<h3>Title T31</h3>
<a31/>
<b31/>
<h3>Title T32</h3>
<a32/>
<b32/>
<h3>Title T33</h3>
<a33/>
<b33/>
<h3>Title T34</h3>
<a34/>
<b34/>
<h3>Title T35</h3>
</html>
We have to substitute $ns1
with:
/*/h3[2]/following-sibling::node()
and to substitute $ns2
with:
/*/h3[3]/preceding-sibling::node()
Thus, the complete XPath expression is:
/*/h3[2]/following-sibling::node()
[count(.|/*/h3[3]/preceding-sibling::node())
=
count(/*/h3[3]/preceding-sibling::node())
]
We can verify that this is the correct XPath expression:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:template match="/">
<xsl:copy-of select=
"/*/h3[2]/following-sibling::node()
[count(.|/*/h3[3]/preceding-sibling::node())
=
count(/*/h3[3]/preceding-sibling::node())
]
"/>
</xsl:template>
</xsl:stylesheet>
When this transformation is applied on the XML document presented above, the wanted, correct result is produced:
<a32/>
<b32/>
II. XPath 2.0 solution:
Use the intersect
operator:
/*/h3[2]/following-sibling::node()
intersect
/*/h3[3]/preceding-sibling::node()
A more general solution - in XPath 2.0 - assuming you want nodes at all tree depths between the two h3 elements, which would not necessarily be siblings.
/path/to/first/h3/following::node()[. << /path/to/second/h3]
Other XPath 1.0 solution when you know both marks are the same element (this case h3
):
/html/body/h3[2]/following-sibling::node()
[not(self::h3)]
[count(preceding-sibling::h3)=2]
Based on dimitre-novatchev excellent answer I can up with the follow solution that rather than hardcoding [2] and [3] for the different H3s i just give the content of the header of the first item.
//h3[text()="Main Page Section Heading"]/following-sibling::node()
[ count(.|//h3[text()="Main Page Section Heading"]/following-sibling::h3[1]/preceding-sibling::node()) =
count(//h3[text()="Main Page Section Heading"]/following-sibling::h3[1]/preceding-sibling::node()) ]
Where i'd want to go further though is to be able to deal with the scenario when i'm looking at the last H3 , and get everything after it, in the above case i can't get what follows the last H3.