views:

121

answers:

1

I've run into a problem with mechanize following links. Here's a snippet of what I'm aiming to do:

for link in mech.links(url_regex='/test/'):
    mech.follow_link(link)

    // Do some processing on that link

    mech.back()

According to mechanize examples, this should work just fine. However it doesn't. Despite calling .back(), the loop ends, even though there are more links to visit. If I comment out mech.follow_link(link) and mech.back(), replacing them with print link.text, it will print out all 50 or so links. However...as soon as I uncomment mech.follow_link, the loop immediately terminates after the first follow_link. back() is working, in that if I print mech.title(), then call mech.back() and print mech.title() again, it clearly shows the first title, then the 'back' page's title. I'm really confused, and this is how it's done in the docs. Not sure what's going on.

+1  A: 

Pirate, I agree, this shouldn't be happening, you're doing pretty much exactly what the documentation page at wwwsearch.sourceforge.net/mechanize/ says; I tried code similar to yours and got the same result where it stopped after the first iteration.

I did, however, find a work-around, namely to save the link URLs from links() into a list, and then follow each URL from that list:

from mechanize import Browser
br = Browser()
linklist = []
br.open(your_page_here)
for link in br.links(url_regex='/test/'): linklist.append(link.url)
for url in linklist:
    br.open(url)
    print br.title()

It's ugly and you shouldn't have to do it, but it seems to work.

I'm not really thrilled with mechanize for bugginess like this (and a problem I had with mechanize handling two submit buttons poorly), but it's very simple to install, seems pretty portable, and can run offline (via simple cron jobs) easily compared to other testing frameworks like Selenium (seleniumhq dot org), which looks great but seems a lot more involved to actually set up and use.

Chirael