views:

2969

answers:

5

I have a simple website I'm testing. It's running on localhost and I can access it in my web browser. The index page is simply the word "running". urllib.urlopen will successfully read the page but urllib2.urlopen will not. Here's a script which demonstrates the problem (this is the actual script and not a simplification of a different test script):

import urllib, urllib2
print urllib.urlopen("http://127.0.0.1").read()  # prints "running"
print urllib2.urlopen("http://127.0.0.1").read() # throws an exception

Here's the stack trace:

Traceback (most recent call last):
  File "urltest.py", line 5, in <module>
    print urllib2.urlopen("http://127.0.0.1").read()
  File "C:\Python25\lib\urllib2.py", line 121, in urlopen
    return _opener.open(url, data)
  File "C:\Python25\lib\urllib2.py", line 380, in open
    response = meth(req, response)
  File "C:\Python25\lib\urllib2.py", line 491, in http_response
    'http', request, response, code, msg, hdrs)
  File "C:\Python25\lib\urllib2.py", line 412, in error
    result = self._call_chain(*args)
  File "C:\Python25\lib\urllib2.py", line 353, in _call_chain
    result = func(*args)
  File "C:\Python25\lib\urllib2.py", line 575, in http_error_302
    return self.parent.open(new)
  File "C:\Python25\lib\urllib2.py", line 380, in open
    response = meth(req, response)
  File "C:\Python25\lib\urllib2.py", line 491, in http_response
    'http', request, response, code, msg, hdrs)
  File "C:\Python25\lib\urllib2.py", line 418, in error
    return self._call_chain(*args)
  File "C:\Python25\lib\urllib2.py", line 353, in _call_chain
    result = func(*args)
  File "C:\Python25\lib\urllib2.py", line 499, in http_error_default
    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 504: Gateway Timeout

Any ideas? I might end up needing some of the more advanced features of urllib2, so I don't want to just resort to using urllib, plus I want to understand this problem.

+1  A: 

Does calling urlib2.open first followed by urllib.open have the same results? Just wondering if the first call to open is causing the http server to get busy causing the timeout?

Sijin
Nope, urllib2 gets the error regardless of whether it's called first, and urllib never gets the error even when it's called multiple times. Good thoughts though.
Eli Courtwright
+1  A: 

I know this answer sucks, but "it works fine on my machine" (WinXP with Python 2.5.2)

Corey Goldberg
I'm also running on Windows XP with Python 2.5.2, so that's interesting. Thanks for giving it a shot.
Eli Courtwright
+8  A: 

Sounds like you have proxy settings defined that urllib2 is picking up on. When it tries to proxy "127.0.0.01/", the proxy gives up and returns a 504 error.

From http://kember.net/articles/216/obscure-python-urllib2-proxy-gotcha:

proxy_support = urllib2.ProxyHandler({})
opener = urllib2.build_opener(proxy_support)
print opener.urlopen("http://127.0.0.1").read()

# Optional - makes this opener default for urlopen etc.
urllib2.install_opener(opener)
print urllib2.urlopen("http://127.0.0.1").read()
John Millikin
This fixed the problem, though I have no idea how or why it thought to use a proxy, since my script was only three lines long and I have no environment variables which indicate anything about any proxy. Still, it's good to have this resolved, so thanks for the help.
Eli Courtwright
+1  A: 

I don't know what's going on, but you may find this helpful in figuring it out:

>>> import urllib2
>>> urllib2.urlopen('http://mit.edu').read()[:10]
'<!DOCTYPE '
>>> urllib2._opener.handlers[1].set_http_debuglevel(100)
>>> urllib2.urlopen('http://mit.edu').read()[:10]
connect: (mit.edu, 80)
send: 'GET / HTTP/1.1\r\nAccept-Encoding: identity\r\nHost: mit.edu\r\nConnection: close\r\nUser-Agent: Python-urllib/2.5\r\n\r\n'
reply: 'HTTP/1.1 200 OK\r\n'
header: Date: Tue, 14 Oct 2008 15:52:03 GMT
header: Server: MIT Web Server Apache/1.3.26 Mark/1.5 (Unix) mod_ssl/2.8.9 OpenSSL/0.9.7c
header: Last-Modified: Tue, 14 Oct 2008 04:02:15 GMT
header: ETag: "71d3f96-2895-48f419c7"
header: Accept-Ranges: bytes
header: Content-Length: 10389
header: Connection: close
header: Content-Type: text/html
'<!DOCTYPE '
fivebells
+1  A: 

urllib.urlopen() throws the following request at the server:

GET / HTTP/1.0
Host: 127.0.0.1
User-Agent: Python-urllib/1.17

while urllib2.urlopen() throws this:

GET / HTTP/1.1
Accept-Encoding: identity
Host: 127.0.0.1
Connection: close
User-Agent: Python-urllib/2.5

So, your server either doesn't understand HTTP/1.1 or the extra header fields.

Deestan