views:

56

answers:

3

I would like to construct an XPath query that will return a "div" or "table" element, so long as it has a descendant containing the text "abc". The one caveat is that it can not have any div or table descendants.

<div>
  <table>
    <form>
      <div>
        <span>
          <p>abcdefg</p>
        </span>
      </div>
      <table>
        <span>
          <p>123456</p>
        </span>
      </table>
    </form>
  </table>
</div>

So the only correct result of this query would be:

/div/table/form/div 

My best attempt looks something like this:

//div[contains(//text(), "abc") and not(descendant::div or descendant::table)] | //table[contains(//text(), "abc") and not(descendant::div or descendant::table)]

but does not return the correct result.

Thanks for your help.

A: 

you could try:

//div[
  descendant::text()[contains(., "abc")] 
  and not(descendant::div or descendant::table)
] | 
//table[
  descendant::text()[contains(., "abc")] 
  and not(descendant::div or descendant::table)
]

does that help?

Dennis Knochenwefel
A: 
//*[self::div|self::table] 
   [descendant::text()[contains(.,"abc")]]  
   [not(descendant::div|descendant::table)]

The problem with contains(//text(), "abc") is that functions cast node sets taking the first node.

Alejandro
+2  A: 

Something different: :)

//text()[contains(.,'abc')]/ancestor::*[self::div or self::table][1]

Seems a lot shorter than the other solutions, doesn't it? :)

Translated to simple English: For any text node in the document that contains the string "abc" select its first ancestor that is either a div or a table.

This is more efficient, as only one full scan of the document tree (and not any other) is required, and the ancestor::* traversal is very cheap compared to a descendent:: (tree) scan.

To verify that this solution "really works":

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"&gt;
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:template match="/">
  <xsl:copy-of select=
  "//text()[contains(.,'abc')]/ancestor::*[self::div or self::table][1] "/>
 </xsl:template>
</xsl:stylesheet>

when this transformation is performed on the provided XML document:

<div>
  <table>
    <form>
      <div>
        <span>
          <p>abcdefg</p>
        </span>
      </div>
      <table>
        <span>
          <p>123456</p>
        </span>
      </table>
    </form>
  </table>
</div>

the wanted, correct result is produced:

<div>
   <span>
      <p>abcdefg</p>
   </span>
</div>

Note: It isn't necessary to use XSLT -- any XPath 1.0 host -- such as DOM, must obtain the same result.

Dimitre Novatchev
thank you for your response and thank you for the +1. I prefer the compactness of this answer, however I'm unable to get it to work in my tests. The other two replies to this question work for me. Is it possible that there is a typo in your response? I can't claim to understand all of it. What does the [1] do? Again, if you have any insight as to why this answer doesn't work for me and the others do, I'd appreciate it. I would +1 for your time but I am new to this site and don't have the ability yet. Thanks.
juan234
@juan234: I have added to my answer some verification code that everyone can run and verify the correctness of the result. This verification shows the correctness of the expression -- there is *no* typo. You may have problems due to different reasons: from using incompliant XPath 1.0 engine to issues in your code -- to pinpoint the reason it is necessary to see your code. `[1]` means the first node of the nodeset selected by the part of the expression that is immediately to the right of `[1]` -- in reverse axes (such as `ancestor::` it actually means the last node in document order).
Dimitre Novatchev
I'm convinced :)
juan234