views:

77

answers:

2

I'm still relatively new to Python, so if this is an obvious question, I apologize.

My question is in regard to the urllib2 library, and it's urlopen function. Currently I'm using this to load a large amount of pages from another server (they are all on the same remote host) but the script is killed every now and then by a timeout error (I assume this is from the large requests).

Is there a way to keep the script running after a timeout? I'd like to be able to fetch all of the pages, so I want a script that will keep trying until it gets a page, and then moves on.

On a side note, would keeping the connection open to the server help?

+1  A: 

Next time the error occurs, take note of the error message. The last line will tell you the type of exception. For example, it might be a urllib2.HTTPError. Once you know the type of exception raised, you can catch it in a try...except block. For example:

import urllib2
import time

for url in urls:
    while True:
        try:
            sock=urllib2.urlopen(url)
        except (urllib2.HTTPError, urllib2.URLError) as err:
            # You may want to count how many times you reach here and
            # do something smarter if you fail too many times.
            # If a site is down, pestering it every 10 seconds may not
            # be very fruitful or polite.
            time.sleep(10)
        else:              
            # Success  
            contents=sock.read()
            # process contents
            break                # break out of the while loop
unutbu
So, if I understand correctly, this will make it "try" until it doesn't return an error?
Parker
@Parker: When Python reaches the code in the `try` block, if a `urllib2.HTTPError` or `urllib2.URLError` occurs, Python will go to the `except` block. If no exception occurs, then Python will go to the `else` block.
unutbu
A: 

The missing manual of urllib2 might help you

mykhal