I'm currently writing a sitemap generator that scrapes a site for urls and builds an xml sitemap. As most of the waiting is spent on requests to uri's I'm using threading, specifically the build in ThreadPool object.
In order to let the main thread wait for the unknown amount of threads to complete I have implemented the following setup. I don't feel this is a good solution though, can any threading gurus advise me of any problems this solution has, or suggest a better way to implement it?
The EventWaitHandle is set to EventResetMode.ManualReset
Here is the thread method
protected void CrawlUri(object o)
{
try
{
Interlocked.Increment(ref _threadCount);
Uri uri = (Uri)o;
foreach (Match match in _regex.Matches(GetWebResponse(uri)))
{
Uri newUri = new Uri(uri, match.Value);
if (!_uriCollection.Contains(newUri))
{
_uriCollection.Add(newUri);
ThreadPool.QueueUserWorkItem(_waitCallback, newUri);
}
}
}
catch
{
// Handle exceptions
}
finally
{
Interlocked.Decrement(ref _threadCount);
}
// If there are no more threads running then signal the waithandle
if (_threadCount == 0)
_eventWaitHandle.Set();
}
Here is the main thread method
// Request first page (based on host)
Uri root = new Uri(context.Request.Url.GetLeftPart(UriPartial.Authority));
// Begin threaded crawling of the Uri
ThreadPool.QueueUserWorkItem(_waitCallback, root);
Thread.Sleep(5000); // TEMP SOLUTION: Sleep for 5 seconds
_eventWaitHandle.WaitOne();
// Server the Xml Sitemap
context.Response.ContentType = "text/xml";
context.Response.Write(GetXml().OuterXml);
Any ideas are much appreciated :)