ansaurus

Question

Python mechanize, following link by url and what is the nr parameter?

Answer 1

+1 A:

From looking at the code, I suspect you want

response1 = br.follow_link(link=LinkObjectToFollow)

nr is the same as documented under the find_link call.

EDIT: In my first cursory glance, I didn't realize "link" wasn't a simple link.

jkerian 2010-08-25 19:51:27

I found the 'nr' info in the code itself. _mechanize.py in the doctext for find_link... right around line 614

jkerian 2010-08-25 19:57:49

oh right I didn't even think that they would have a doc file there different from the online version, as I'm used to it also being online, thanks for the tip

Rick 2010-08-25 20:16:42

Answer 2

+2 A:

br.follow_link takes either a Link object or a keyword arg (such as nr=0).

br.links() lists all the links.

br.links(url_regex='...') lists all the links whose urls matches the regex.

br.links(text_regex='...') lists all the links whose link text matches the regex.

br.follow_link(nr=num) follows the numth link on the page, with counting starting at 0. It returns a response object (the same kind what br.open(...) returns)

br.find_link(url='...') returns the Link object whose url exactly equals the given url.

br.find_link, br.links, br.follow_link, br.click_link all accept the same keywords. Run help(br.find_link) to see documentation on those keywords.

Edit: If you have a target url that you wish to follow, you could do something like this:

import mechanize
br = mechanize.Browser()
response=br.open("http://www.example.com/")
target_url='http://www.rfc-editor.org/rfc/rfc2606.txt'
for link in br.links():
    print(link)
    # Link(base_url='http://www.example.com/', url='http://www.rfc-editor.org/rfc/rfc2606.txt', text='RFC 2606', tag='a', attrs=[('href', 'http://www.rfc-editor.org/rfc/rfc2606.txt')])
    print(link.url)
    # http://www.rfc-editor.org/rfc/rfc2606.txt
    if link.url == target_url:
        print('match found')
        # match found            
        break

br.follow_link(link)   # link still holds the last value it had in the loop
print(br.geturl())
# http://www.rfc-editor.org/rfc/rfc2606.txt

unutbu 2010-08-25 19:53:46

@Rick: If you loop through `br.links()`, you can look at the string `link.url` to figure out if you want to follow it or not. No regex required.

unutbu 2010-08-25 20:25:26

thanks, I think I got it now... i don't know what it is but the versions of python mech that I have (latest ver) doesn't seem to have much in its doc file, not sure why.. anyways, thanks for the help and I think I can get it based on what you said, will try

Rick 2010-08-25 20:30:56

I still can't figure out how to get a link to match, I am trying to use the regex as the full url but its not giving a match (when I do the for loop it never enters the loop implying it is not getting any matches)

Rick 2010-08-25 20:37:16

@Rick: Regex is tricky. Some characters in your url like `.*+?()[]` all have different meanings in the context of a regex pattern as opposed to plain string comparison. Since you have the full url, you can use `==` to compare the url against `link.url`. I've added some code to show what I mean.

unutbu 2010-08-25 20:53:22

thanks, I have a lot of regex experience I think the issue was that I had a problem in my headers, I appreciate your help and I found another way to do it without using regex so I will post that for reference once I test it

Rick 2010-08-25 21:02:36

Answer 3

+1 A:

I found this way to do it, for reference for anyone who doesn't want to use regex:

r = br.open("http://www.somewebsite.com")
br.find_link(url='http://www.somewebsite.com/link1.html')
req = br.click_link(url='http://www.somewebsite.com/link1.html')
br.open(req)
print br.response().read()

Or, it will work by the link's text also:

r = br.open("http://www.somewebsite.com")
br.find_link(text='Click this link')
req = br.click_link(text='Click this link')
br.open(req)
print br.response().read()

Rick 2010-08-25 21:10:27

@Rick: I like this solution a lot better than the one I suggested. (I think it even works without the calls to `br.find_link`). Please accept this one so it will bubble to the top.

unutbu 2010-08-26 12:16:05

Answer 4

A:

nr is used for where exactly link you follow. if the text or url you has been regex more than one. default is 0 so if you use default you will follow link first regex at all . for example the source :

<a href="link.html>Click this link</a>
<a href="link2.html>Click this link</a>

in this example we need to follow "Click this link" text but we choose link2.html to follow exactly

br.click_link(text='Click this link', nr=1)

by it you will get link2.html response

Gunslinger_ 2010-10-03 12:51:17

ansaurus

tags:

views:

answers:

Python mechanize, following link by url and what is the nr parameter?

related questions