views:

154

answers:

2

I'm aware that urllib2 is available on Google App Engine as a wrapper of Urlfetch and, as you know, Universal Feedparser uses urllib2.

Do you know any method to set a timeout on urllib2 ?
Is timeout parameter on urllib2 been ported on Google App Engine version?

I'm not interested in method like:

rssurldata = urlfetch(rssurl, deadline=..)
feedparser.parse(rssurldata)
A: 

Have you tried setting the socket timeout value? Taken from here:

As of Python 2.3 you can specify how long a socket should wait for a response before timing out. This can be useful in applications which have to fetch web pages. By default the socket module has no timeout and can hang. Currently, the socket timeout is not exposed at the httplib or urllib2 levels. However, you can set the default timeout globally for all sockets using :

import socket
import urllib2

# timeout in seconds
timeout = 10
socket.setdefaulttimeout(timeout)

# this call to urllib2.urlopen now uses the default timeout
# we have set in the socket module
req = urllib2.Request('http://www.voidspace.org.uk')
response = urllib2.urlopen(req)

I'm not sure if GAE reads this value, but it's worth a shot!

Edit:

urllib2 has the ability to pass a timeout parameter:

The optional timeout parameter specifies a timeout in seconds for blocking operations like the connection attempt (if not specified, the global default timeout setting will be used). This actually only works for HTTP, HTTPS, FTP and FTPS connections.connections.

advait
@thethimble yep, it was my first try. Socket has not setdefaulttimeout method on gae :).Thanks for your time though.
systempuntoout
What about urllib2's timeout parameter?
advait
i'm asking "Is timeout parameter on urllib2 been ported on Google App Engine version?".
systempuntoout
+1  A: 

There's no simple way to do this, as the wrapper doesn't provide a way to pass through the timeout value, to the best of my knowledge. One hackish option would be to monkeypatch the urlfetch API:

old_fetch = urlfetch.fetch
def new_fetch(url, payload=None, method=GET, headers={},
          allow_truncated=False, follow_redirects=True,
          deadline=10.0, *args, **kwargs):
  return old_fetch(url, payload, method, headers, allow_truncated,
                   follow_redirects, deadline, *args, **kwargs)
urlfetch.fetch = new_fetch
Nick Johnson
@Nick where's the correct place to patch this? On main or straight before the call to the crawling library?
systempuntoout
Top-level in any module that gets imported before you use the API.
Nick Johnson
@Nick worked like a charm, thanks.
systempuntoout