views:

45

answers:

4

I'm sorry to have to ask something like this but python's mechanize documentation seems to really be lacking and I can't figure this out.. they only give one example that I can find for following a link:

response1 = br.follow_link(text_regex=r"cheese\s*shop", nr=1)

But I don't want to use a regex, I just want to follow a link based on its url, how would I do this.. also what is "nr" that is used sometimes for following links?

Thanks for any info

+1  A: 

From looking at the code, I suspect you want

response1 = br.follow_link(link=LinkObjectToFollow)

nr is the same as documented under the find_link call.

EDIT: In my first cursory glance, I didn't realize "link" wasn't a simple link.

jkerian
I found the 'nr' info in the code itself. _mechanize.py in the doctext for find_link... right around line 614
jkerian
oh right I didn't even think that they would have a doc file there different from the online version, as I'm used to it also being online, thanks for the tip
Rick
+2  A: 

br.follow_link takes either a Link object or a keyword arg (such as nr=0).

br.links() lists all the links.

br.links(url_regex='...') lists all the links whose urls matches the regex.

br.links(text_regex='...') lists all the links whose link text matches the regex.

br.follow_link(nr=num) follows the numth link on the page, with counting starting at 0. It returns a response object (the same kind what br.open(...) returns)

br.find_link(url='...') returns the Link object whose url exactly equals the given url.

br.find_link, br.links, br.follow_link, br.click_link all accept the same keywords. Run help(br.find_link) to see documentation on those keywords.

Edit: If you have a target url that you wish to follow, you could do something like this:

import mechanize
br = mechanize.Browser()
response=br.open("http://www.example.com/")
target_url='http://www.rfc-editor.org/rfc/rfc2606.txt'
for link in br.links():
    print(link)
    # Link(base_url='http://www.example.com/', url='http://www.rfc-editor.org/rfc/rfc2606.txt', text='RFC 2606', tag='a', attrs=[('href', 'http://www.rfc-editor.org/rfc/rfc2606.txt')])
    print(link.url)
    # http://www.rfc-editor.org/rfc/rfc2606.txt
    if link.url == target_url:
        print('match found')
        # match found            
        break

br.follow_link(link)   # link still holds the last value it had in the loop
print(br.geturl())
# http://www.rfc-editor.org/rfc/rfc2606.txt
unutbu
@Rick: If you loop through `br.links()`, you can look at the string `link.url` to figure out if you want to follow it or not. No regex required.
unutbu
thanks, I think I got it now... i don't know what it is but the versions of python mech that I have (latest ver) doesn't seem to have much in its doc file, not sure why.. anyways, thanks for the help and I think I can get it based on what you said, will try
Rick
I still can't figure out how to get a link to match, I am trying to use the regex as the full url but its not giving a match (when I do the for loop it never enters the loop implying it is not getting any matches)
Rick
@Rick: Regex is tricky. Some characters in your url like `.*+?()[]` all have different meanings in the context of a regex pattern as opposed to plain string comparison. Since you have the full url, you can use `==` to compare the url against `link.url`. I've added some code to show what I mean.
unutbu
thanks, I have a lot of regex experience I think the issue was that I had a problem in my headers, I appreciate your help and I found another way to do it without using regex so I will post that for reference once I test it
Rick
+1  A: 

I found this way to do it, for reference for anyone who doesn't want to use regex:

r = br.open("http://www.somewebsite.com")
br.find_link(url='http://www.somewebsite.com/link1.html')
req = br.click_link(url='http://www.somewebsite.com/link1.html')
br.open(req)
print br.response().read()

Or, it will work by the link's text also:

r = br.open("http://www.somewebsite.com")
br.find_link(text='Click this link')
req = br.click_link(text='Click this link')
br.open(req)
print br.response().read()
Rick
@Rick: I like this solution a lot better than the one I suggested. (I think it even works without the calls to `br.find_link`). Please accept this one so it will bubble to the top.
unutbu
A: 

nr is used for where exactly link you follow. if the text or url you has been regex more than one. default is 0 so if you use default you will follow link first regex at all . for example the source :

<a href="link.html>Click this link</a>
<a href="link2.html>Click this link</a>

in this example we need to follow "Click this link" text but we choose link2.html to follow exactly

br.click_link(text='Click this link', nr=1)

by it you will get link2.html response

Gunslinger_