views:

157

answers:

3

Hi,

My local airport disgracefully blocks users without IE, and looks awful. I want to write a Python scripts that would get the contents of the Arrival and Departures pages every few minutes, and show them in a more readable manner.

My tools of choice are mechanize for cheating the site to believe I use IE, and BeautifulSoup for parsing page to get the flights data table.

Quite honestly, I got lost in the BeautifulSoup documentation, and can't understand how to get the table (whose title I know) from the entire document, and how to get a list of rows from that table.

Any ideas?

Adam

+1  A: 
soup = BeautifulSoup(HTML)

# the first argument to find tells it what tag to search for
# the second you can pass a dict of attr->value pairs to filter
# results that match the first tag
table = soup.find( "table", {"title":"TheTitle"} )

rows=list()
for row in table.findAll("tr"):
   rows.append(row)

# now rows contains each tr in the table (as a BeautifulSoup object)
# and you can search them to pull out the times
goggin13
+3  A: 

This is not the specific code you need, just a demo of how to work with BeautifulSoup. It finds the table who's id is "Table1" and gets all of its tr elements.

html = urllib2.urlopen(url).read()
bs = BeautifulSoup(html)
table = bs.find(lambda tag: tag.name=='table' and tag.has_key('id') and tag['id']=="Table1") 
rows = table.findAll(lambda tag: tag.name=='tr')
Ofri Raviv
Thats really cool, I didn't know you could pass lambdas to find.
goggin13
Great indeed! Check your Facebook mailbox, I've sent you a message.
Adam Matan
+1  A: 

Just if you care, BeautifulSoup is no longer maintained, and the original maintainer suggests a transition to lxml. Xpath should do the trick just nicely.

Thanks, that's a really useful piece of information. I'll check lxml.
Adam Matan