ansaurus

Question

Find all tags with a specific attribute value

Answer 1

+1 A:

Here's one way, using lxml and the XPath 'descendant::*[@attrib1="yes, this is what we want"]'. The XPath tells lxml to look at all the descendants of the current node and return those with an attrib1 attribute equal to "yes, this is what we want".

import lxml.html as lh 
import cStringIO

content='''
<html>
    <body>
        <invalid html here/>
        <dont care> ... </dont care>
        <invalid html here too/>
        <interesting attrib1="naah, it is not this"> ... </interesting tag>
        <interesting attrib1="yes, this is what we want">
            <group>
                <line>
                    data
                </line>
            </group>
            <group>
                <line>
                    data1
                <line>
            </group>
            <group>
                <line>
                    data2
                <line>
            </group>
        </interesting>
    </body>
</html>
'''
doc=lh.parse(cStringIO.StringIO(content))
tags=doc.xpath('descendant::*[@attrib1="yes, this is what we want"]')
print(tags)
# [<Element interesting at b767e14c>]
for tag in tags:
    print(lh.tostring(tag))
# <interesting attrib1="yes, this is what we want"><group><line>
#                     data
#                 </line></group><group><line>
#                     data1
#                 <line></line></line></group><group><line>
#                     data2
#                 <line></line></line></group></interesting>

unutbu 2010-09-23 13:24:43

Thanks, you saved my day!

myle 2010-09-23 16:14:23

ansaurus

tags:

views:

answers:

Find all tags with a specific attribute value

related questions