Hi there,
I'm currently retrieving and parsing pages from a website using urllib2
. However, there are many of them (more than 1000), and processing them sequentially is painfully slow.
I was hoping there was a way to retrieve and parse pages in a parallel fashion. If that's a good idea, is it possible, and how do I do it?
Also, what are "reasonable" values for the number of pages to process in parallel (I wouldn't want to put too much strain on the server or get banned because I'm using too many connections)?
Thanks!