tags:

views:

273

answers:

2

I am trying to write a function in Python to use a public anonymous proxy and fetch a webpage, but I got a rather strange error.
The code (I have Python 2.4):

import urllib2    
def get_source_html_proxy(url, pip, timeout):
# timeout in seconds (maximum number of seconds willing for the code to wait in
# case there is a proxy that is not working, then it gives up) 
    proxy_handler = urllib2.ProxyHandler({'http': pip})
    opener = urllib2.build_opener(proxy_handler)
    opener.addheaders = [('User-agent', 'Mozilla/5.0')]
    urllib2.install_opener(opener)
    req=urllib2.Request(url)
    sock=urllib2.urlopen(req)
    timp=0 # a counter that is going to measure the time until the result (webpage) is
           # returned
    while 1:
        data = sock.read(1024)
        timp=timp+1
        if len(data) < 1024: break
        timpLimita=50000000 * timeout
        if timp==timpLimita: # 5 millions is about 1 second
            break
    if timp==timpLimita:
        print IPul + ": Connection is working, but the webpage is fetched in more than 50 seconds. This proxy returns the following IP: " + str(data)
        return str(data)
    else:
        print "This proxy " + IPul + "= good proxy. " + "It returns the following IP: " + str(data)
        return str(data)
# Now, I call the function to test it for one single proxy (IP:port) that does not support user and password (a public high anonymity proxy)
#(I put a proxy that I know is working - slow, but is working)
rez=get_source_html_proxy("http://www.whatismyip.com/automation/n09230945.asp", "93.84.221.248:3128", 50)
print rez

The error:

Traceback (most recent call last):

File "./public_html/cgi-bin/teste5.py", line 43, in ?

rez=get_source_html_proxy("http://www.whatismyip.com/automation/n09230945.asp", "xx.yy.zzz.ww:3128", 50)

File "./public_html/cgi-bin/teste5.py", line 18, in get_source_html_proxy sock=urllib2.urlopen(req)
File "/usr/lib64/python2.4/urllib2.py", line 130, in urlopen return _opener.open(url, data)
File "/usr/lib64/python2.4/urllib2.py", line 358, in open response = self._open(req, data)
File "/usr/lib64/python2.4/urllib2.py", line 376, in _open '_open', req)
File "/usr/lib64/python2.4/urllib2.py", line 337, in _call_chain result = func(*args)
File "/usr/lib64/python2.4/urllib2.py", line 573, in lambda r, proxy=url, type=type, meth=self.proxy_open: \
File "/usr/lib64/python2.4/urllib2.py", line 580, in proxy_open if '@' in host:
TypeError: iterable argument required

I do not know why the character "@" is an issue (I have no such in my code. Should I have?)
Thanks in advance for your valuable help.

A: 

The @ itself is a red herring, the traceback comes from the fact that it's trying to execute a x in host operation and, in that context, that means host has to be iterable (such as a string). You'll want to inspect the value of host there, it's something like None or a number, not what you meant.

keturn
Thanks, but what host? The IP:port of proxy? Or the URL?
carmao
a debugger can show you more details in the traceback. try winpdb, Wing IDE, or ipython (with `%xmode verbose` and `%debug`)
keturn
thanks for the hints.
carmao
some other source indicates there is an issue with:proxy_handler = urllib2.ProxyHandler({'http': pip}) as it is missing "http://"So, it seems it should be:proxy_handler = urllib2.ProxyHandler({'http': 'http://' + pip})
carmao
+2  A: 

urllib2.build_opener takes a list of handlers

opener = urllib2.build_opener([proxy_handler])
gnibbler
thanks. I need to learn more. So far I took code from here, from there, but never depicted as it should. I need to go back to the basics.
carmao