views:

33

answers:

1
import urllib2, cookielib
import ClientForm
from BeautifulSoup import BeautifulSoup

first_name = "Mona"
last_name = "Sahlin"
url = 'http://www.ratsit.se/BC/Search.aspx'
cookiejar = cookielib.LWPCookieJar()
cookiejar = urllib2.HTTPCookieProcessor(cookiejar)

opener = urllib2.build_opener(cookiejar)
urllib2.install_opener(opener)

response = urllib2.urlopen(url)
forms = ClientForm.ParseResponse(response, backwards_compat=False)

#Use to print out forms if website design changes
for x in forms:
    print x

'''

forms print result:
<aspnetForm POST http://www.ratsit.se/BC/Search.aspx application/x-www-form-urlencoded               <HiddenControl(__VIEWSTATE=/wEPDwULLTExMzU2NTM0MzcPZBYCZg9kFgICAxBkZBYGAgoPDxYCHghJbWFnZVVy....E1haW4kZ3J2U2VhcmNoUmVzdWx0D2dkBRdjdGwwMCRtdndVc2VyTG9naW5MZXZlbA8PZGZkle2yQ/dc9eIGMaQPJ/EEJs899xE=) (readonly)>
<TextControl(ctl00$cphMain$txtFirstName=)>
<TextControl(ctl00$cphMain$txtLastName=)>
<TextControl(ctl00$cphMain$txtBirthDate=)>
<TextControl(ctl00$cphMain$txtAddress=)>
<TextControl(ctl00$cphMain$txtZipCode=)>
<TextControl(ctl00$cphMain$txtCity=)>
<TextControl(ctl00$cphMain$txtKommun=)>  
<CheckboxControl(ctl00$cphMain$chkExaktStavning=[on])>    <ImageControl(ctl00$cphMain$cmdButton=)>
>

'''

#Confirm correct form
form = forms[0]

print form.__dict__

#print form.__dict__.get('controls')

controls = form.__dict__.get('controls')
print "------------------------------------------------------------"
try:
    controls[1] = first_name
    controls[2] = last_name
    page = urllib2.urlopen(form.click('ctl00$cphMain$cmdButton')).read()

''' give error here: The following error occured: "'str' object has no attribute 'name'" '''

#    print controls[9]
        print '----------here-------'
        soup = BeautifulSoup(''.join(page))
        soup = soup.prettify()
+1  A: 

Here's a working version:

import urllib2, cookielib
import ClientForm
from BeautifulSoup import BeautifulSoup

first_name = "Mona"
last_name = "Sahlin"
url = 'http://www.ratsit.se/BC/Search.aspx'
cookiejar = cookielib.LWPCookieJar()
cookiejar = urllib2.HTTPCookieProcessor(cookiejar)

opener = urllib2.build_opener(cookiejar)
urllib2.install_opener(opener)

response = urllib2.urlopen(url)
forms = ClientForm.ParseResponse(response, backwards_compat=False)

# Use to print out forms to check if website design changes
for i, x in enumerate(forms):
    print 'Form[%d]: %r, %d controls' % (i, x.name, len(x.controls))
    for j, c in enumerate(x.controls):
        print ' ', j, c.__class__.__name__,
        try: n = c.name
        except AttributeError: n = 'NO NAME'
        print repr(n)

#Confirm correct form
form = forms[0]

controls = form.__dict__.get('controls')
print controls, form.controls

print "------------------------------------------------------------"
try:
    controls[1].value = first_name
    controls[2].value = last_name
    p = form.click('ctl00$cphMain$cmdButton')
    print 'p is', repr(p)
    page = urllib2.urlopen(p).read()
    ''' give error here: The following error occured: "'str' object has no attribute 'name'" '''

#    print controls[9]
    print '----------here-------'
    soup = BeautifulSoup(''.join(page))
    soup = soup.prettify()
finally:
    print 'ciao!'

The core bug fix (beyond completing a try statement which you probably truncated, to fix the syntax error) is to use

    controls[1].value = first_name
    controls[2].value = last_name

in lieu of your buggy code which just assigned directly to controls[1] and controls[2]. That bug of yours was what erroneously put strings in the controls list in lieu of controls (and thus made the by-name search in form.click fail as you observed).

Alex Martelli
Thank you very much!!but it still gives AttributeError: strip;and Beatifulsoup can not work.I think it still the submit button problem in 'Click()'What do you think so?
dion.yang
sorry, I made a mistake on urllib, which should use urllib2 Thank you again
dion.yang
@dion, the code I've given works great (with BeautifulSoup 3.0.9, Python 2.6.4, MacOSX 10.5) even including a `print soup` just before the `finally`. Of course, at that point `soup` is a string (because you've rebound it to the result of a prettify call!) so there isn't much you can do with it anymore beyond printing or saving to file (re-parsing it would be truly silly -- if that's what you need, just **don't** rebind it, of course!-). So whatever's going wrong with your code (edit your Q to show the exact error msg and traceback!) it must be BS 3.1 (don't use it!) or wrong Python.
Alex Martelli