views:

329

answers:

3

I have some code that is using mechanize and a password protected site. I can login just fine and get the results I expect. However, once I log in I don't want to "click" links I want to iterate through a list of URLs. Unfortunately each .open() call simply gets a re-direct to the login page, which is the behaviour I would expect if I had logged out or tried to login with a different browser. This leads me to believe it is cookie handling of some sort but I'm at a loss.

def main():
    browser = mechanize.Browser()
    browser.set_handle_robots(False)
    # The below code works perfectly
    page_stats = login_to_BOE(browser)
    print page_stats

    # This code ALWAYS gets the login page again NOT the desired 
    # behaviour of getting the new URL. This is the behaviour I would
    # expect if I had logged out of our site.
    for page in PAGES:
        print '%s%s' % (SITE, page)
        page = browser.open('%s%s' % (SITE, page))
        page_stats = get_page_statistics(page.get_data())
        print page_stats
+1  A: 

This isn't an answer, but it might lead you in the right direction. Try turning on Mechanize's extensive debugging facilities, using some combination of the statements below:

browser.set_debug_redirects(True)
browser.set_debug_responses(True)
browser.set_debug_http(True)

This will provide a flood of HTTP information, which I found very useful when I developed my one and only Mechanize-based application.

I should note that I'm not doing much (if anything) different in my application than what you showed in your question. I create a browser object the same way, then pass it to this login function:

def login(browser):
    browser.open(config.login_url)
    browser.select_form(nr=0)
    browser[config.username_field] = config.username
    browser[config.password_field] = config.password
    browser.submit()
    return browser

I can then open authentication-required pages with browser.open(url) and all of the cookie handling is handled transparently and automatically for me.

Will McCutchen
+1  A: 

Instead of using for each link:

browser.open('www.google.com')

Try using the following after doing the initial login:

browser.follow_link(text = 'a href text')

My guess is that calling open is what is resetting your cookies.

Hawker
+1  A: 

Will,

Your suggestion pointed me in exactly the right direction.

Every web browser I have ever used responded to something like the following correct:

http://www.foo.com//bar/baz/trool.html

Since I hate getting things concatenated incorrectly my SITE variable was "http://www.foo.com/"

In addition all the other URLS were "/bar/baz/trool.html"

My calls to open ended up being .open('http://www.foo.com//bar/baz/trool.html') and the mechanize browser obviously doesn't massage that like a "real" browser would. Apache didn't like the urls.

rhacer