views:

457

answers:

1

Using Python, I'm trying to read the values on http://utahcritseries.com/RawResults.aspx. I can read the page just fine, but am having difficulty changing the value of the year combo box, to view data from other years. How can I read the data for years other than the default of 2002?

The page appears to be doing an HTTP Post once the year combo box has changed. The name of the control is ct100$ContentPlaceHolder1$ddlSeries. I try setting a value for this control using urllib.urlencode(postdata), but I must be doing something wrong-the data on the page is not changing. Can this be done in Python?

I'd prefer not to use Selenium, if at all possible.

I've been using code like this(from stackoverflow user dbr)

import urllib

postdata = {'ctl00$ContentPlaceHolder1$ddlSeries': 9}

src = urllib.urlopen(
    "http://utahcritseries.com/RawResults.aspx",
    data = urllib.urlencode(postdata)
).read()

print src

But seems to be pulling up the same 2002 data. I've tried using firebug to inspect the headers and I see a lot of extraneous and random-looking data being sent back and forth-do I need to post these values back to the server also?

+1  A: 

Use the excellent mechanize library:

from mechanize import Browser

b = Browser()
b.open("http://utahcritseries.com/RawResults.aspx")
b.select_form(nr=0)

year = b.form.find_control(type='select')
year.get(label='2005').selected = True

src = b.submit().read()
print src

Mechanize is available on PyPI: easy_install mechanize

codeape
thanks! That worked right out of the box! I'm new to both python and mechanize - i wasn't sure where to read for this. Thanks a bunch!
Neil Kodner
If you need to parse the HTML you should check out the BeautifulSoup library. Mechanize + beautiful soup is terrific for screen-scraping. http://www.crummy.com/software/BeautifulSoup/
codeape