tags:

views:

112

answers:

3

New to this library (no more familiar with BeautifulSoup either, sadly), trying to do something very simple (search by inline style):

<td style="padding: 20px">blah blah </td>

I just want to select all tds where style="padding: 20px", but I can't seem to figure it out. All the examples show how to select td, such as:

for col in page.cssselect('td'):

but that doesn't help me much.

+1  A: 
import lxml.html
data = """<td style="padding: 20px">blah blah </td>
<td style="padding: 21px">bow bow</td>
<td style="padding: 20px">buh buh</td>
"""
doc = lxml.html.document_fromstring(data)
for col in doc.cssselect('td'):
    style = col.attrib['style']
    if style=='padding: 20px':
        print(col.text.strip())

prints

blah blah
buh buh

and manages to skip bow bow.

unutbu
Thanks! Now all I need is for lxml to actually install on a windows machine, and I'm golden!
ropa
hehe, good luck! :)
unutbu
A: 

Well, there's a better way: XPath.

import lxml.html
data = """<td style="padding: 20px">blah blah </td>
<td style="padding: 21px">bow bow</td>
<td style="padding: 20px">buh buh</td>
"""
doc = lxml.html.document_fromstring(data)
for col in doc.xpath("//td[@style='padding: 20px']"):
    print col.text

That is neater and also faster.

nosklo
+2  A: 

If you prefer to use CSS selectors:

import lxml.html
data = """<td style="padding: 20px">blah blah </td>
<td style="padding: 21px">bow bow</td>
<td style="padding: 20px">buh buh</td>
"""
doc = lxml.html.document_fromstring(data)
for td in doc.cssselect('td[style="padding: 20px"]'):
   print td.text
Ruslan Spivak