views:

180

answers:

1

Hello, Sorry if you feel like this has been asked but I have read the related questions and being quite new to Python I could not find how to write this request in a clean manner.

For now I have this minimal Python code:

from mechanize import Browser

from BeautifulSoup import BeautifulSoup
import re
import urllib2



br = Browser()
br.open("http://www.atpworldtour.com/Rankings/Singles.aspx")

filename = "rankings.html"
FILE = open(filename,"w")

html = br.response().read(); 

soup = BeautifulSoup(html);
links = soup.findAll('a', href=re.compile("Players"));
for link in links:
    print link['href'];

FILE.writelines(html);

It retrieves all the link where the href contains the word player.

Now the HTML I need to parse looks something like this:

<tr>
  <td>1</td>
  <td><a href="/Tennis/Players/Top-Players/Roger-Federer.aspx">Federer,&nbsp;Roger</a>&nbsp;(SUI)</td>
  <td><a href="/Tennis/Players/Top-Players/Roger-Federer.aspx?t=rb">10,550</a></td>
  <td>0</td>
  <td><a href="/Tennis/Players/Top-Players/Roger-Federer.aspx?t=pa&m=s">19</a></td>
</tr>

The 1 contains the rank of the player. I would like to be able to retrieve this data in a dictionary:

  • rank
  • name of the player
  • link to the detailed page (here /Tennis/Players/Top-Players/Roger-Federer.aspx)

Could you give me some pointers or if this is easy enough help me to build the piece of code ? I am not sure about how to formulate the request in Beautiful Soup.

Anthony

+1  A: 

Searching for the players using your method will work, but will return 3 results per player. Easier to search for the table itself, and then iterate over the rows (except the header):

table=soup.find('table', 'bioTableAlt')
for row in table.findAll('tr')[1:]:
    cells = row.findAll('td')
    #retreieve data from cells...

To get the data you need:

    rank = cells[0].string
    player = cells[1].a.string
    link = cells[1].a['href']
interjay
Thank you for your reply, I would like to validate but I am at work at the moment, I will try this tonight at home and validate your answer !
BlueTrin