I've got a chunk of code that looks like this:
var task = Task.Factory.StartNew(() =>
{
while (!bc.IsCompleted && !cts.Token.IsCancellationRequested)
{
PriorityDownloadPair pd;
if (bc.TryTake(out pd))
{
var baseUri = pd.Value.Uri;
Console.WriteLine("({0}) {1}", pd.Key, baseUri.AbsoluteUri);
IEnumerable<HtmlNode> sq = null;
try
{
sq = SharpQuery.SharpQuery.Load(baseUri);
}
catch (WebException we)
{
Console.WriteLine(we.Message);
continue;
}
foreach (var node in sq.Find("a[href]"))
{
bc.Add(new PriorityDownloadPair(1, new DownloadItem { Uri = new Uri(baseUri, node.Attributes["href"].Value) }));
}
}
}
}, cts.Token);
It runs fine for awhile (following and downloading every link it finds) until it hits a 404.
The 404 occurs in the SharpQuery.Load method as I'd expect:
public static IEnumerable<HtmlNode> Load(Uri uri)
{
var doc = new HtmlDocument();
WebClient wc = new WebClient();
using (var str = wc.OpenRead(uri))
doc.Load(str);
yield return doc.DocumentNode;
}
But then why isn't my try block catching it?
If I go up the call stack it points to this line instead:
foreach (var node in sq.Find("a[href]"))
But sq.Find
doesn't even touch any web interfaces. What's going on?
These lines are synchronous,
using (var str = wc.OpenRead(uri))
doc.Load(str);
Aren't they? Shouldn't cause an error down the road when it's finished loading?