views:

328

answers:

3

I'm downloading a huge set of files with following code in a loop:

try:
    urllib.urlretrieve(url2download, destination_on_local_filesystem)
except KeyboardInterrupt:
    break
except:
    print "Timed-out or got some other exception: "+url2download

If the server times-out on URL url2download when connection is just initiating, the last exception is handled properly. But sometimes server responded, and downloading is started, but the server is so slow, that it'll takes hours for even one file, and eventually it returns something like:

Enter username for Clients Only at albrightandomalley.com:
Enter password for  in Clients Only at albrightandomalley.com:

and just hangs there (although no username/passworde is aksed if the same link is downloaded through the browser).

My intention in this situation would be -- skip this file and go to the next one. The question is -- how to do that? Is there a way in python to specify how long is OK to work on downloading one file, and if more time is already spent, interrupt, and go forward?

+2  A: 

There's a discussion of this here. Caveats (in addition to the ones they mention): I haven't tried it, and they're using urllib2, not urllib (would that be a problem for you?) (Actually, now that I think about it, this technique would probably work for urllib, too).

Jacob Gabrielson
+3  A: 

If you're not limited to what's shipped with python out of the box, then the urlgrabber module might come in handy:

import urlgrabber
urlgrabber.urlgrab(url2download, destination_on_local_filesystem,
                   timeout=30.0)
RommeDeSerieux
+1  A: 

This question is more general about timing out a function: http://stackoverflow.com/questions/366682/how-to-limit-execution-time-of-a-function-call-in-python

I've used the method described in my answer there to write a wait for text function that times out to attempt an auto-login. If you'd like similar functionality you can reference the code here:

http://code.google.com/p/psftplib/source/browse/trunk/psftplib.py

monkut