tags:

views:

57

answers:

4

How would you find all nodes between two H3's using XPATH?

+4  A: 

In XPath 1.0 one way to do this is by using the Kayessian method for node-set intersection:

$ns1[count(.|$ns2) = count($ns2)]

The above expression selects exactly the nodes that are part both of the node-set $ns1 and the node-set $ns2.

To apply this to the specific question -- let's say we need to select all nodes between the 2nd and 3rd h3 element in the following XML document:

<html>
  <h3>Title T31</h3>
    <a31/>
    <b31/>
  <h3>Title T32</h3>
    <a32/>
    <b32/>
  <h3>Title T33</h3>
    <a33/>
    <b33/>
  <h3>Title T34</h3>
    <a34/>
    <b34/>
  <h3>Title T35</h3>
</html>

We have to substitute $ns1 with:

/*/h3[2]/following-sibling::node()

and to substitute $ns2 with:

/*/h3[3]/preceding-sibling::node()

Thus, the complete XPath expression is:

/*/h3[2]/following-sibling::node()
             [count(.|/*/h3[3]/preceding-sibling::node())
             =
              count(/*/h3[3]/preceding-sibling::node())
             ]

We can verify that this is the correct XPath expression:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"&gt;
 <xsl:output omit-xml-declaration="yes" indent="yes"/>

 <xsl:template match="/">
  <xsl:copy-of select=
   "/*/h3[2]/following-sibling::node()
             [count(.|/*/h3[3]/preceding-sibling::node())
             =
              count(/*/h3[3]/preceding-sibling::node())
             ]
   "/>
 </xsl:template>
</xsl:stylesheet>

When this transformation is applied on the XML document presented above, the wanted, correct result is produced:

<a32/>

<b32/>

II. XPath 2.0 solution:

Use the intersect operator:

   /*/h3[2]/following-sibling::node()
intersect
   /*/h3[3]/preceding-sibling::node()
Dimitre Novatchev
+1 may the right always prevail. :-)
LarsH
+1 for most general solution and Set Theory invocation
Alejandro
nice stuff. it worked line a charm
klumsy
so the one use case this doesn't fix, is content after the last H3.. i'm curiouos what modification would be needed to be able to pict this up.
klumsy
@klumsy: Just prepend the existing expression with `"/*/h3[2]/following-sibling::node()[not(/*/h3[3])] | `
Dimitre Novatchev
A: 

A more general solution - in XPath 2.0 - assuming you want nodes at all tree depths between the two h3 elements, which would not necessarily be siblings.

/path/to/first/h3/following::node()[. << /path/to/second/h3]
Nick Jones
+1  A: 

Other XPath 1.0 solution when you know both marks are the same element (this case h3):

/html/body/h3[2]/following-sibling::node()
                           [not(self::h3)]
                           [count(preceding-sibling::h3)=2]
Alejandro
A: 

Based on dimitre-novatchev excellent answer I can up with the follow solution that rather than hardcoding [2] and [3] for the different H3s i just give the content of the header of the first item.

//h3[text()="Main Page Section Heading"]/following-sibling::node()
 [  count(.|//h3[text()="Main Page Section Heading"]/following-sibling::h3[1]/preceding-sibling::node()) =  
    count(//h3[text()="Main Page Section Heading"]/following-sibling::h3[1]/preceding-sibling::node())  ]

Where i'd want to go further though is to be able to deal with the scenario when i'm looking at the last H3 , and get everything after it, in the above case i can't get what follows the last H3.

klumsy