I am struggling with the syntax required to grab some hrefs in a td. The table, tr and td elements dont have any class's or id's.
If I wanted to grab the anchor in this example, what would I need?
< tr > < td > < a >...
Thanks
I am struggling with the syntax required to grab some hrefs in a td. The table, tr and td elements dont have any class's or id's.
If I wanted to grab the anchor in this example, what would I need?
< tr > < td > < a >...
Thanks
Something like this?
from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup(html)
anchors = [td.find('a') for td in soup.findAll('td')]
That should find the first "a" inside each "td" in the html you provide. You can tweak td.find
to be more specific or else use findAll
if you have several links inside each td.
As per the docs, you first make a parse tree:
import BeautifulSoup
html = "<html><body><tr><td><a href='foo'/></td></tr></body></html>"
soup = BeautifulSoup.BeautifulSoup(html)
and then you search in it, for example for <a>
tags whose immediate parent is a <td>
:
for ana in soup.findAll('a'):
if ana.parent.name == 'td':
print ana["href"]