Python MultiThreading With Urllib2 Issue

views:

238

answers:

Python MultiThreading With Urllib2 Issue

I can download multiple files quite fast with many threads at once but the problem is that after a few minutes it tends to slow down gradually to almost a full stop, I have no idea why. There's nothing wrong with my code that I can see and my RAM/CPU is fine.. The only thing I can think of is that urllib2 isn't handling the massive amount of connections correctly. If it helps, I am using proxies but I had this issue without them as well. Does anyone have any suggestions or insight to this issue? Thanks!

+1 A:

Can you confirm that doing the same number of simultaneous downloads without python continues to download fast? Perhaps the issue is not with your code, but with your connection getting throttled or with the site serving the files.

If that's not the issue you could try the pyprocessing library to implement a multi process version instead of a multi threaded version. If you're using python 2.6 pyprocessing is included in the distribution as multiprocessing. It's quite easy to convert threaded code to multi process code, so it's worth a try if only to confirm the issue is with the threading.

Parand 2009-10-31 16:27:51

Like another answer suggested, the problem might be with your connection or the site that is serving the files. If you can run your code against a test server locally then you will be able to eliminate this.

If the problem goes away when using the test server then the problem lies with your connection or the remote server.

If the problem persists when using the test server then it's most like something in your code, but then you will at least have the server logs to give you more insight in to what is happening.

As for another avenue you can explore, this thread suggests using httplib2 instead of urllib2.

Rod Hyde 2009-10-31 17:43:31

ansaurus

tags:

views:

answers:

Python MultiThreading With Urllib2 Issue

related questions