tags:

views:

38

answers:

1

Hello there

I have this program that check a website, and I want to know how can I check it via proxy in Python...

this is the code, just for example

while True:
        try:
            h = urllib.urlopen(website)
            break
        except:
            print '['+time.strftime('%Y/%m/%d %H:%M:%S')+'] '+'ERROR. Trying again in a few seconds...'
            time.sleep(5)
+4  A: 

By default, urlopen uses the environment variable http_proxy to determine which HTTP proxy to use:

$ export http_proxy='http://myproxy.example.com:1234'
$ python myscript.py  # Using http://myproxy.example.com:1234 as a proxy

If you instead want to specify a proxy inside your application, you can give a proxies argument to urlopen:

proxies = {'http': 'http://myproxy.example.com:1234'}
print "Using HTTP proxy %s" % proxies['http']
urllib.urlopen("http://www.google.com", proxies=proxies)

Edit: If I understand your comments correctly, you want to try several proxies and print each proxy as you try it. How about something like this?

candidate_proxies = ['http://proxy1.example.com:1234',
                     'http://proxy2.example.com:1234',
                     'http://proxy3.example.com:1234']
for proxy in candidate_proxies:
    print "Trying HTTP proxy %s" % proxy
    try:
        result = urllib.urlopen("http://www.google.com",
                                proxies={'http': proxies})
        print "Got URL using proxy %s" % proxy
        break
    except:
        print "Trying next proxy in 5 seconds"
        time.sleep(5)
Pär Wieslander
using your example, how can I print what proxy it is using in the time the urlopen occur?
Shady
@Shady: Just throw in a `print` statement that prints the value of `proxies['http']`. Take a look at my updated example to see how it could be done.
Pär Wieslander
ok thanks, but if I want more proxies, like, tons of it, for example 10 proxies, opening one before the next one
Shady
@Shady: You mean that you want to try a new proxy for each call until you find one that works? Change the `proxies` argument for each call to `urlopen`, passing in a new proxy for each call.
Pär Wieslander
actually, I want to check the website with some proxies, like 10, and then repeat the proccess with this proxies, but the question here is HOW can I print what proxy the urlopen is using at the time of the check
Shady
@Shady: I've added another example that uses several proxies. Is this what you're looking for?
Pär Wieslander
Yes, thank you... now I just need some proxy list very good =p
Shady
Wieslander, I'm getting error for every proxy I use, what could be?
Shady
@Shady: That's impossible to tell without more details. I would start by verifying that the proxies actually work by trying them out in a web browser first. If they **don't** work in the browser either, then the problem is probably with the proxies or in the network. If they **do** work in the browser, you'll probably have to double check that you're actually passing the proxy settings correctly to `urlopen`.
Pär Wieslander
Wieslander, I've just tested the proxy and it worked on firefox, I've got it from here (http://www.samair.ru/proxy/time-01.htm).. could you give some look on my script to see what is happening ? I will appreciate =) (http://pastebin.com/TgZw7xvV)
Shady