I'm working on a multi-threaded scraper for a website and as per a different question I've decided to use the ThreadPool with QueueUserWorkItem().
How can I continually Queue work items without queuing them all at once? I need to queue > 300k items (one for each userID) and if I loop to queue them all I'll run out of memory.
So, what I would like is:
// 1 = startUserID, 300000 = endUserID, 25 = MaxThreads
Scraper webScraper = new Scraper(1, 300000, 25);
webScraper.Start();
// return immediately while webScraper runs in the background
During this time, webScraper is continuouslly adding all 300000 workItems as threads become available.
Here is what I have so far:
public class Scraper
{
private int MaxUserID { get; set; }
private int MaxThreads { get; set; }
private static int CurrentUserID { get; set; }
private bool Running { get; set; }
private Parser StatsParser = new Parser();
public Scraper()
: this(0, Int32.MaxValue, 25)
{
}
public Scraper(int CurrentUserID, int MaxUserID, int MaxThreads)
{
this.CurrentUserID = CurrentUserID;
this.MaxUserID = MaxUserID;
this.MaxThreads = MaxThreads;
this.Running = false;
ThreadPool.SetMaxThreads(MaxThreads, MaxThreads);
}
public void Start()
{
int availableThreads;
// Need to start a new thread to spawn the new WorkItems so Start() will return right away?
while (Running)
{
// if (!CurrentUserID >= MaxUserID)
// {
// while (availableThreads > 0)
// {
// ThreadPool.QueueUserWorkItem(new WaitCallBack(Process));
// }
// }
// else
// { Running = false; }
}
}
public void Stop()
{
Running = false;
}
public static void process(object state)
{
var userID = Interlocked.Increment(ref CurrentUserID);
... Fetch Stats for userID
}
}
Is this the right approach?
Can anyone point me in the right direction for handling the creation of my work items while in the background once Start() is called, and not creating all Work items at once?