views:

182

answers:

3

What would be the best library for multithreaded harvesting/downloading with multiple proxy support? I've looked at Tkinter, it looks good but there are so many, does anyone have a specific recommendation? Many thanks!

+1  A: 

Twisted

anthony
Thanks, I'm taking a look now
Cookies
A: 

Is this something you can't just do by passing a URL to newly spawned threads and calling urllib2.urlopen in each one, or is there a more specific requirement?

Kylotan
urllib2 isn't thread safe from what I've seen, but I could of just been doing it wrong because I'm a noob to threading. I am downloading a lot of files so I'd rather use something a bit more powerful than just urllib anyway
Cookies
It's almost certain to be thread-safe unless you do something inherently dangerous like trying to access the same object from multiple threads.
Kylotan
A: 

Also take a look at http://scrapy.org/, which is a scraping framework built on top of twisted.

twneale
Excellent, I don't see anything about proxy support but I think I could do that myself.
Cookies
No. Support for HTTP proxies is not currently implemented in Scrapy, but it will be in the future. For more information about this, follow this ticket. Setting the http_proxy environment variable won’t work because Twisted (the library used by Scrapy to download pages) doesn’t support it. See this Twisted ticket for more info.
Cookies