views:

235

answers:

2

I'm going to screen-scrape a gaming website for some data. I'd like to be able to send multiple requests so I can screen-scrape several pages at once. I've emailed the site administrator and gotten permission to scrape at a moderate rate (a few requests per second).

As far as I know BackgroundWorker uses the thread-pool which I think would be desirable.
Does it make sense to use BackgroundWorker for this use-case, or use actual Threads?

+5  A: 

There is another construct known as a ThreadPool. It might be worth using this as it will manage multiple threads for you and you can control the min/max number of threads. BackgroundWorker is limited one thread and is best used for WinForms apps where you have background I/O and don't want to lock the user interface thread.

You will want to keep a queue of pages to scrape and feed these to the thread pool. You may still want to pause or limit the threads to get the intended level of scraping. I would personally separate parsing of retrieved page content from the actual retrieval of the pages over HTTP. This would generally make things easier to maintain and you may not need the local processing to be multi-threaded.

BrianLy
The Bgw uses the ThreadPool, it's just an interface (to the Pool). And there is no reason not to make as many Bgw as you need.
Henk Holterman
Although it is technically possible it is far from the best tool for the job here. You would end up writing your own system to manage the BGWs which might look similar to a ThreadPool.
BrianLy
+2  A: 

Typical use of the BackgroundWorker is a keeping a UI responsive; instead, use the thread pool to queue multiple http requests/responses.

See ThreadPool.QueueUserWorkItem

Mitch Wheat