Hi,
I'm trying to get an XPath expression together that will give me all the descendent elements of a node that match a filter (e.g. [contains(@class,"interesting")] but which don't have a specific ancestor e.g. [contains(@class,"frame")]. Probably best explained by example:
<div class="frame">
<p class="interesting">alice</p>
<p class="interesting">bob</p>
<p class="interesting">carol>/p>
<div>
<div>
<h3 class="interesting">david</h3>
</div>
</div>
<div class="frame">
<p class="interesting">drevil</p>
</div>
</div>
So in this example, I want to be able to match all the "interesting" elements, that are descendents of the first div with class="frame". But I don't want the "interesting" elements underneath the nested "frame" div.
Ideally I'd have a single XPath expression that would give me those elements with content alice, bob, carol and david. But not drevil.
It is like the presence of the nested frame occludes that branch of the tree from the search.
Any ideas? All responses much appreciated.
In response to Robert, I have this Python code (though I will utlimately do it browser side):
from lxml import etree
from StringIO import StringIO
testxml = """
<div>
<div class="frame">
<p class="interesting">alice</p>
<p class="interesting">bob</p>
<p class="interesting">carol</p>
<div>
<div>
<h3 class="interesting">david</h3>
</div>
</div>
<div class="frame">
<p class="interesting">drevil</p>
</div>
</div>
</div>
"""
xsl = """
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<output>
<xsl:apply-templates select="//div[@class='frame'][1]/*"/>
</output>
</xsl:template>
<xsl:template match="*">
<xsl:apply-templates select="*"/>
</xsl:template>
<xsl:template match="*[@class='frame']"/>
<xsl:template match="*[@class='interesting']">
<xsl:copy-of select="."/>
</xsl:template>
</xsl:stylesheet>
"""
def test_xsl():
xslt_doc = etree.parse(StringIO(xsl))
transform = etree.XSLT(xslt_doc)
doc = etree.parse(StringIO(testxml))
result = transform(doc)
print result
if __name__=="__main__":
test_xsl()
This gives the following result:
<?xml version="1.0"?>
<output>
<p class="interesting">alice</p>
<p class="interesting">bob</p>
<p class="interesting">carol</p>
<h3 class="interesting">david</h3>
<p class="interesting">drevil</p>
</output>
As you can see drevil is lurking.
Note, Tomalak is correct in that the 2nd match on * has no effect (other than to remove spaces from the output which is a bit odd!).
It just twigged though that I might not be able to go with the XSLT approach, the whole point of doing an XPath query in the first place was to gain references to nodes within the original HTML document. If I do a transform, the nodes contained in the new result document will be copies and not the original ones I'm looking for and thus no use!
This might be the dumbest question ever, but is there a way to maintain a references from nodes in the transformed document to nodes in the original?
Thanks Tomalak, Robert and mykhal for your help so far. I think I just need to buy a book on XSLT...