tags:

views:

27

answers:

1

I've got a chunk of code that looks like this:

var task = Task.Factory.StartNew(() =>
{
    while (!bc.IsCompleted && !cts.Token.IsCancellationRequested)
    {
        PriorityDownloadPair pd;
        if (bc.TryTake(out pd))
        {
            var baseUri = pd.Value.Uri;
            Console.WriteLine("({0}) {1}", pd.Key, baseUri.AbsoluteUri);
            IEnumerable<HtmlNode> sq = null;
            try
            {
                sq = SharpQuery.SharpQuery.Load(baseUri);
            }
            catch (WebException we)
            {
                Console.WriteLine(we.Message);
                continue;
            }
            foreach (var node in sq.Find("a[href]"))
            {
                bc.Add(new PriorityDownloadPair(1, new DownloadItem { Uri = new Uri(baseUri, node.Attributes["href"].Value) }));
            }
        }
    }
}, cts.Token);

It runs fine for awhile (following and downloading every link it finds) until it hits a 404.

The 404 occurs in the SharpQuery.Load method as I'd expect:

public static IEnumerable<HtmlNode> Load(Uri uri)
{
    var doc = new HtmlDocument();
    WebClient wc = new WebClient();
    using (var str = wc.OpenRead(uri))
        doc.Load(str);
    yield return doc.DocumentNode;
}

But then why isn't my try block catching it?

If I go up the call stack it points to this line instead:

foreach (var node in sq.Find("a[href]"))

But sq.Find doesn't even touch any web interfaces. What's going on?

These lines are synchronous,

        using (var str = wc.OpenRead(uri))
            doc.Load(str);

Aren't they? Shouldn't cause an error down the road when it's finished loading?

+1  A: 

This is because the load does not execute until you actually read the data, which is after the try block.

Shiraz Bhaiji
Because it's an enumerable, eh? Didn't realize they worked like that. Calling `.ToArray()` or something would force it to evaluate, no?
Mark
Yes, or you could just move the for each into the try block (if that is what is accessing it)
Shiraz Bhaiji
Yes... the foreach is what's causing the access/evaluation, but I think I'd rather catch the error early, where it's actually occurring, rather than sometime down the road. That's really crazy though... that you can propagate errors like that. Looping over an enumerable seems like it should be safe.
Mark