tags:

views:

56

answers:

1

I have a list of 100 proxies. The URL I am interested in is abc.com. I want to check the number of proxies which can successfully fetch this URL and the time taken for the same. I am hoping I made sense. I am a Python noob. I am looking for a code snippet. A helping hand is really appreciated :)

Proxies :

200.43.54.212
200.43.54.212
200.43.54.212
200.43.54.212

URL :

abc.com

Desired result :

Proxy          isGood Time

200.43.54.112  n      23.12  
200.43.54.222  n      12.34 
200.43.54.102  y      11.09
200.43.54.111  y       8.85

p.s : All the above proxies have ports either 80 or 8080

+4  A: 

You can fetch URLs using urllib2. To get the amount of time taken, you can use the time module. Here's a simple example that does what you seem to want:

import urllib2
import time


def testProxies(url, proxies):
    # prepare the request
    req = urllib2.Request(url)
    # run the request for each proxy
    results = ["Proxy           isGood Time"]
    for proxy in (proxies):
        # now set the proxy
        req.set_proxy(proxy, "http")
        # time it
        start = time.time()
        # try to open the URL
        try:
            urllib2.urlopen(req)
            # format the results for success
            results.append("%s  y      %.2f" % (proxy, time.time()-start))
        except urllib2.URLError:
            # format the results for failure
            results.append("%s  n      %.2f" % (proxy, time.time()-start))

    return results

testResults = testProxies("http://www.abc.com", ["200.43.54.112", "200.43.54.222",
                  "200.43.54.102", "200.43.54.111"])
for result in testResults:
    print result

The main points are creating the request with urllib2.Request(url) and using the set_proxy() function, which lets you set a proxy for the request.

Daniel G
Thank you so much for taking the time to actually code from the sample data. I really really appreciate it :)How to take into consideration the port no. cos' every proxy has a port associated. Thanks again!
ThinkCode
You can simply add the port number to the end of each URL with a colon; eg `"200.43.54.112:80"` for port 80.
Daniel G
Yep, was trying with port as mentioned but am stuck with this error : "NameError: global name 'URLError' is not defined". Not sure what is wrong.. Use your code as is.
ThinkCode
Sorry about that - mistyped it. It should work now. URLError is inside the urllib2 package.
Daniel G
Thanks. That fixed the URLError. Now I get : "File "proxyCheck.py", line 18, in testProxies results.append("{0} y {1:0.2f}".format(proxy, time.time()-start))AttributeError: 'str' object has no attribute 'format'"And this req.set_proxy("200.43.54.112", "http") should be req.set_proxy(proxy, "http") , right?
ThinkCode
Sorry, I made a mistake copying the code. The problem with the .format() is that it's a python 2.6 function. I'll fix the problem quickly - OK, now it uses older string formatting. It *should* work now. I hope :-D
Daniel G
You sir, are a master!It was my fault actually. you were using 2.6+ and I am using 2.5+. Thanks a tonne :)
ThinkCode