ansaurus

Question

How do i declare a timeout using urllib2 on Google App Engine?

Answer 1

A:

Have you tried setting the socket timeout value? Taken from here:

As of Python 2.3 you can specify how long a socket should wait for a response before timing out. This can be useful in applications which have to fetch web pages. By default the socket module has no timeout and can hang. Currently, the socket timeout is not exposed at the httplib or urllib2 levels. However, you can set the default timeout globally for all sockets using :

import socket
import urllib2

# timeout in seconds
timeout = 10
socket.setdefaulttimeout(timeout)

# this call to urllib2.urlopen now uses the default timeout
# we have set in the socket module
req = urllib2.Request('http://www.voidspace.org.uk')
response = urllib2.urlopen(req)

I'm not sure if GAE reads this value, but it's worth a shot!

Edit:

urllib2 has the ability to pass a timeout parameter:

The optional timeout parameter specifies a timeout in seconds for blocking operations like the connection attempt (if not specified, the global default timeout setting will be used). This actually only works for HTTP, HTTPS, FTP and FTPS connections.connections.

advait 2010-07-26 20:50:03

@thethimble yep, it was my first try. Socket has not setdefaulttimeout method on gae :).Thanks for your time though.

systempuntoout 2010-07-26 20:51:59

What about urllib2's timeout parameter?

advait 2010-07-26 20:59:55

i'm asking "Is timeout parameter on urllib2 been ported on Google App Engine version?".

systempuntoout 2010-07-27 06:34:19

Answer 2

+1 A:

There's no simple way to do this, as the wrapper doesn't provide a way to pass through the timeout value, to the best of my knowledge. One hackish option would be to monkeypatch the urlfetch API:

old_fetch = urlfetch.fetch
def new_fetch(url, payload=None, method=GET, headers={},
          allow_truncated=False, follow_redirects=True,
          deadline=10.0, *args, **kwargs):
  return old_fetch(url, payload, method, headers, allow_truncated,
                   follow_redirects, deadline, *args, **kwargs)
urlfetch.fetch = new_fetch

Nick Johnson 2010-07-27 12:54:38

@Nick where's the correct place to patch this? On main or straight before the call to the crawling library?

systempuntoout 2010-07-27 13:15:39

Top-level in any module that gets imported before you use the API.

Nick Johnson 2010-07-27 14:18:31

@Nick worked like a charm, thanks.

systempuntoout 2010-07-28 14:47:22

ansaurus

tags:

views:

answers:

How do i declare a timeout using urllib2 on Google App Engine?

related questions