ansaurus

Question

BeautifulSoup: Get the contents of a specific table

Answer 1

+1 A:

soup = BeautifulSoup(HTML)

# the first argument to find tells it what tag to search for
# the second you can pass a dict of attr->value pairs to filter
# results that match the first tag
table = soup.find( "table", {"title":"TheTitle"} )

rows=list()
for row in table.findAll("tr"):
   rows.append(row)

# now rows contains each tr in the table (as a BeautifulSoup object)
# and you can search them to pull out the times

goggin13 2010-05-29 16:05:11

Answer 2

+3 A:

This is not the specific code you need, just a demo of how to work with BeautifulSoup. It finds the table who's id is "Table1" and gets all of its tr elements.

html = urllib2.urlopen(url).read()
bs = BeautifulSoup(html)
table = bs.find(lambda tag: tag.name=='table' and tag.has_key('id') and tag['id']=="Table1") 
rows = table.findAll(lambda tag: tag.name=='tr')

Ofri Raviv 2010-05-29 16:05:25

Thats really cool, I didn't know you could pass lambdas to find.

goggin13 2010-05-29 16:09:37

Great indeed! Check your Facebook mailbox, I've sent you a message.

Adam Matan 2010-05-29 16:28:13

Answer 3

+1 A:

Just if you care, BeautifulSoup is no longer maintained, and the original maintainer suggests a transition to lxml. Xpath should do the trick just nicely.

2010-05-29 23:38:01

Thanks, that's a really useful piece of information. I'll check lxml.

Adam Matan 2010-05-30 07:54:20

ansaurus

tags:

views:

answers:

BeautifulSoup: Get the contents of a specific table

related questions