views:

67

answers:

1

I'm developing a scraper that scrapes one web page for links, then creates threads which execute scraping of subpages.

This is what a thread does:

Dim client As New WebClient()
Dim stream As Stream = client.OpenRead(_Address)
Dim streamReader As New StreamReader(stream, True)
_Content = streamReader.ReadToEnd()
streamReader.Close()
streamReader.Dispose()
stream.Close()
stream.Dispose()
client.Dispose()

I've noticed that sometimes (usually when there are more simultaneous threads running) a thread throws an exception. It happens randomly, the exception is thrown at client.OpenRead and it says "Value cannot be null. Parameter name: address". I also have a try..catch here so I put a breakpoint in the catch block and it appears that the _Address is valid at the time, yet the code throws an exception.

_Address is a protected class field and cannot be accessed by other threads.

The exact message is:

"Value cannot be null. Parameter name: address".

The exception is System.ArgumentNullException.

Stack trace is:

at System.Net.WebClient.OpenRead(String address) at MyAppName.Scraper.Scrape() in MyAppFolder\Scraper.vb:line 96

Do you have any suggestion on fixing the issue? Thank you in advance.

+2  A: 

The internal implementation for WebClient.OpenRead(string address) is just:

public Stream OpenRead(string address)
{
    if (address == null)
    {
        throw new ArgumentNullException("address");
    }
    return this.OpenRead(this.GetUri(address));
}

so _Address must be null when it gets passed in.

Maybe try something like this:

private string _address;
private string _Address
{
    get
    {
        if(_address == null)
            throw new ArgumentNullException("_Address was never set and is still null!");
        return _address;
    }
    set
    {
        if(value == null)
            throw new ArgumentNullException("_Address can not be null!");
        _address = value;
    }
}

So basically if something tries to set _Address to null, you will get an error right when it happens and can see in the call stack where it is being set to null.

rally25rs
I up voted you because that is what I would do: check for null values and either immediately throw an exception or rely on a default value if logic dictates to do so. Just for future curtsey, please convert to VB.NET prior to posting (converter whatever) since this was tagged as 'VB.NET'. Thanks.
atconway