ansaurus

Question

XPath - Searching for descendent elements that a) match a filter and b) don't have a specific ancestor.

Answer 1

+1 A:

you can use selector limiting ancestor div[@class="frame"] elements to 1

//div[@class="frame"][1]//*[@class="interesting" and count(ancestor::div[@class="frame"])=1]

it worked:

>>> import lxml.html
>>> data = """
        <div class="frame">
            <p class="interesting">alice</p>
            <p class="interesting">bob</p>
            <p class="interesting">carol</p>

            <div> 
                <div>
                    <h3 class="interesting">david</h3>
                </div>
            </div>

            <div class="frame">
                <p class="interesting">drevil</p>
            </div>
        </div>
    """
>>> tree = lxml.html.fromstring(data)
>>> tree.xpath('//div[@class="frame"][1]//*[@class="interesting" and count(ancestor::div[@class="frame"])=1]/text()')
['alice', 'bob', 'carol', 'david']

mykhal 2009-12-18 02:23:47

.. in human language: for the first frame div, select all its descendants with interesting class, but only those having exactly one frame div ancestor

mykhal 2009-12-18 02:30:27

by the way, please notice that your HTML code example is invalid, you should flip the angle bracket after carol :)

mykhal 2009-12-18 02:37:05

Got this to work, I knew I wasn't being creative enough in my use of filters somehow but couldn't find any significantly complicated examples on the web.

andre_b 2009-12-18 14:28:18

Thanks for your help BTW!

andre_b 2009-12-18 14:28:57

this was really not a simple one :)

mykhal 2009-12-18 23:31:40

Answer 2

A:

mykhal's answer is probably the best you can do in XPath, at least as you've defined the problem.

The trouble with it is that it could be punishingly inefficient when used on large documents with many potentially interesting elements. For every potentially interesting element it finds, it has to examine every node in its ancestor axis.

In XSLT, you can implement a series of templates that find only the elements you're looking for, and that not only visit each element only once, also don't visit any elements that they don't have to:

<xsl:template match="/">
    <output>
       <xsl:apply-templates select="/descendant::*[@class='frame'][1]/*"/>
    </output>
</xsl:template>

<xsl:template match="*[@class='frame']"/>

<xsl:template match="*[@class='interesting']">
   <xsl:copy-of select="."/>
</xsl:template>

The built-in template behavior for elements, which is used whenever templates are applied to an element and no higher-ranking template is found, is to apply templates to its children.

The first template finds the ancestor element you're interested in, and applies templates to its child elements.

The second template says, basically, "If you're recursing down the elements and hit an element with a class attribute of 'frame', don't examine its descendants." This keeps the transform from ever even examining an uninteresting element.

And finally, the last template defines what to do when you hit an interesting element - in this case, it copies it to the output in its entirety.

Robert Rossney 2009-12-18 06:33:53

Your `<xsl:template match="*">` effectively is the built-in default template for elements. For all I know, you could remove it entirely without changing the output of your XSLT.

Tomalak 2009-12-18 13:31:38

I can't actually get this to work, that pesky drevil keeps spoiling the party! But I _do_ get the principal - I was aware of the performance problems with finding the ancestory, indeed the reason I wanted an XPath expression was so that I could avoid doing it in JavaScript which would have been even slower. I will need to brush up on my XSLT, but will post my code at the top of the page anyway.

andre_b 2009-12-18 14:02:17

Fixed both issues alluded to in the comments. `//div[@class='frame'][1]` really means `/descendant-or-self::node()/child::div[@class='frame'][1]` (see the note at the end of section 2.5 of the XPath recommendation). So it was selecting all `div` elements with a `class` attribute of "frame" that were the first child of their parent node, i.e. all of them.

Robert Rossney 2009-12-18 18:21:31

ansaurus

tags:

views:

answers:

XPath - Searching for descendent elements that a) match a filter and b) don't have a specific ancestor.

related questions