views:

1750

answers:

4

I installed Python 2.6.2 earlier on a Windows XP machine and run the following code:

import urllib2<br>
import urllib<br><br>
page = urllib2.Request('http://www.python.org/fish.html')&lt;br&gt;
urllib2.urlopen( page )<br><br>

I get the following error.

Traceback (most recent call last):<br>
  File "C:\Python26\test3.py", line 6, in <module><br>
    urllib2.urlopen( page )<br>
  File "C:\Python26\lib\urllib2.py", line 124, in urlopen<br>
    return _opener.open(url, data, timeout)<br>
  File "C:\Python26\lib\urllib2.py", line 383, in open<br>
    response = self._open(req, data)<br>
  File "C:\Python26\lib\urllib2.py", line 401, in _open<br>
    '_open', req)<br>
  File "C:\Python26\lib\urllib2.py", line 361, in _call_chain<br>
    result = func(*args)<br>
  File "C:\Python26\lib\urllib2.py", line 1130, in http_open<br>
    return self.do_open(httplib.HTTPConnection, req)<br>
  File "C:\Python26\lib\urllib2.py", line 1105, in do_open<br>
    raise URLError(err)<br>
URLError: <urlopen error [Errno 11001] getaddrinfo failed><br><br><br>
+3  A: 
import urllib2
response = urllib2.urlopen('http://www.python.org/fish.html')
html = response.read()

You're doing it wrong.

mcandre
Now i get this error:<br><br>Traceback (most recent call last):<br>... '_open', req)<br> File "C:\Python26\lib\urllib2.py", line 361, in _call_chain<br> result = func(*args)<br> File "C:\Python26\lib\urllib2.py", line 1130, in http_open<br> return self.do_open(httplib.HTTPConnection, req)<br> File "C:\Python26\lib\urllib2.py", line 1105, in do_open<br> raise URLError(err)<br>URLError: <urlopen error [Errno 11001] getaddrinfo failed><br><br><br>Thanks for the help.
DJDonaL3000
That could be because the URL you gave doesn't exist (try visiting it). Use something else that does.
mcandre
Downvoted because it doesn't address the real problem. I am using 2.6.1 on WinXP (exact same urllib2.py, I checked) and when I execute DJDonaL3000's code, I get the expected urllib2.HTTPError: HTTP Error 404: Not Found.
John Y
+1  A: 

Windows Vista, python 2.6.2

It's a 404 page, right?

>>> import urllib2
>>> import urllib
>>>
>>> page = urllib2.Request('http://www.python.org/fish.html')
>>> urllib2.urlopen( page )
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python26\lib\urllib2.py", line 124, in urlopen
    return _opener.open(url, data, timeout)
  File "C:\Python26\lib\urllib2.py", line 389, in open
    response = meth(req, response)
  File "C:\Python26\lib\urllib2.py", line 502, in http_response
    'http', request, response, code, msg, hdrs)
  File "C:\Python26\lib\urllib2.py", line 427, in error
    return self._call_chain(*args)
  File "C:\Python26\lib\urllib2.py", line 361, in _call_chain
    result = func(*args)
  File "C:\Python26\lib\urllib2.py", line 510, in http_error_default
    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 404: Not Found
>>>
hughdbrown
A: 

DJ

First, I see no reason to import urllib; I've only ever seen urllib2 used to replace urllib entirely and I know of no functionality that's useful from urllib and yet is missing from urllib2.

Next, I notice that http://www.python.org/fish.html gives a 404 error to me. (That doesn't explain the backtrace/exception you're seeing. I get urllib2.HTTPError: HTTP Error 404: Not Found

Normally if you just want to do a default fetch of a web pages (without adding special HTTP headers, doing doing any sort of POST, etc) then the following suffices:

req = urllib2.urlopen('http://www.python.org/')
html = req.read()
# and req.close() if you want to be pedantic
Jim Dennis
+2  A: 

Have a look in the urllib2 source, at the line specified by the traceback:

File "C:\Python26\lib\urllib2.py", line 1105, in do_open
raise URLError(err)

There you'll see the following fragment:

    try:
        h.request(req.get_method(), req.get_selector(), req.data, headers)
        r = h.getresponse()
    except socket.error, err: # XXX what error?
        raise URLError(err)

So, it looks like the source is a socket error, not an HTTP protocol related error. Possible reasons: you are not on line, you are behind a restrictive firewall, your DNS is down,...

All this aside from the fact, as mcandre pointed out, that your code is wrong.

krawyoti