views:

240

answers:

2

I'm making a python URL grabber program. For my purposes, I want it to time out really really fast, so I'm doing

urllib2.urlopen("http://.../", timeout=2)

Of course it times out correctly as it should. However, it doesn't bother to close the connection to the server, so the server thinks the client is still connected. How can I ask urllib2 to just close the connection after it times out?

Running gc.collect() doesn't work and I'd like to not use httplib if I can't help it.

The closest I can get is: the first try will time out. The server reports that the connection closed just as the second try times out. Then, the server reports the connection closed just as the third try times out. Ad infinitum.

Many thanks.

A: 

This is SUCH a hack, but the following code works. If the request is in another function AND it does not raise an exception, then the socket is always closed.

def _fetch(self, url):
    try:
        return urllib2.urlopen(urllib2.Request(url), timeout=5).read()
    except urllib2.URLError, e:
        if isinstance(e.reason, socket.timeout):
            return None
        else:
            raise e

def fetch(self, url):
    x = None
    while x is None:
        x = self._fetch(url)
        print "Timeout"
    return x

Does ANYONE have a better way?

Michael
+1  A: 

I have a suspicion that the socket is still open in the stack frames. When Python raises an exception it stores the stack frames so debuggers and other tools can view the stack and introspect values.

For historical reasons, and now for backwards compatibility, the stack information is stored (on a per-thread basis) in sys (see sys.exc_info(), sys.exc_type and others). This is one of the things which has been removed in Python 3.0.

What that means for you is the stack is still alive, and referenced. There stack contains the local data for some function which has the open socket. That's why the socket isn't yet closed. It's only when the stack trace is removed that everything will be gc'ed.

To test if that's the case, insert something like

try:
  1/0
except ZeroDivisionError:
  pass

in your except clause. That's a quick way to replace the current exception with something else.

Andrew Dalke
Hmm! A very interesting thought. Thanks, but it doesn't quite work; nevertheless, I never thought of it that way.I think that for my project, my entire reasoning is just a bit too hacky. It would be better for me to not rely on this and instead just terminate duplicate connections on the server.
Michael