views:

201

answers:

2

Hi guys,

I've created a web spider that accesses both a US and EU server. The US and EU servers are the same data structure, but have different data inside them, and I want to collate it all. In order to be nice to the server, there's a wait time between each request. As the program is exactly the same, in order to speed up processing, I've threaded the program so it can access the EU and US servers simultaneously.

This crawling will take on the order of weeks, not days. There will be exceptions, and while I've tried to handle everything inside the program, it's likely something weird might crop up. To be truly defensive about this, I'd like to catch a thread that's failed, log the error and restart it. Worst case I lose a handful of pages out of thousands, which is better than having a thread fail and lose 50% of speed. However, from what I've read, Python threads die silently. Does anyone have any ideas?

class AccessServer(threading.Thread):
    def __init__(self, site):
        threading.Thread.__init__(self)
        self.site = site
        self.qm = QueueManager.QueueManager(site)

    def run(self):
        # Do stuff here


def main():
    us_thread = AccessServer(u"us")
    us_thread.start()

    eu_thread = AccessServer(u"eu")
    eu_thread.start()
+4  A: 

Can you have e.g. the main thread function as a monitoring thread? E.g. require that the worker thread regularly update some thread-specific timestamp value, and if a thread hasn't updated it's timestamp within a suitable time, have the monitoring thread kill it and restart?

Or, see this answer

janneb
That's a good idea, and that thread you pointed me to is great. Thanks for your help!
Lewisham
+6  A: 

Just use a try: ... except: ... block in the run method. If something weird happens that causes the thread to fail, it's highly likely that an error will be thrown somewhere in your code (as opposed to in the threading subsystem itself); this way you can catch it, log it, and restart the thread. It's your call whether you want to actually shut down the thread and start a new one, or just enclose the try/except block in a while loop so the same thread keeps running.

Another solution, if you suspect that something really weird might happen which you can't detect through Python's error handling mechanism, would be to start a monitor thread that periodically checks to see that the other threads are running properly.

David Zaslavsky
Didn't think to enclose a `try: except:` in the `run` method, that seems like a good, Pythonic way to do it. Thanks!
Lewisham