views:

307

answers:

1

I have a download worker that uses ThreadPool-threads to download files. After enhancing these to apply some Selenium tests to the downloaded files, I am constantly experiencing TimeOut-exceptions with the file downloaders and delays running the Selenium tests. More precisely:

  • When the program starts, the download threads start downloading and a couple of pages are seamlessly processed via Selenium
  • Shortly after, the first download threads start throwing TimeOut exceptions from HttpWebRequest.
  • At the same time, commands stop flowing to Selenium (as observed in the SeleniumRC log), but the thread running Selenium is not getting any exception
  • This situation holds as long as there are entries in the download list: new download threads are being started and terminate after receiving TimeOuts (without trying to lock Selenium)
  • As soon as no more download threads are being started, Selenium starts receiving commands again and the threads waiting for the lock are processed sequentially as designed

Now here's the download code:

HttpWebRequest request = null;
WebResponse response = null;
Stream stream = null;
StreamReader sr = null;
try
{
    request = (HttpWebRequest) WebRequest.Create(uri);
    request.ServicePoint.ConnectionLimit = MAX_CONNECTIONS_PER_HOST;
    response = request.GetResponse();
    stream = response.GetResponseStream();
    // Read the stream...
}
finally
{
    if (request != null) request.Abort();
    if (response != null) response.Close();
    if (stream != null)
    {
        stream.Close();
        stream.Dispose();
    }
    if (sr != null)
    {
        sr.Close();
        sr.Dispose();
    }
}

And this is how Selenium is used afterwards in the same thread:

lock(SeleniumLock)
{
    selenium.Open(url);
    // Run some Selenium commands, but no selenium.stop()
}

Where selenium is a static variable that is initialized in the static constructor of the class (via selenium.start()).

I assume I am running into the CLR connection limit, so I added these lines during initalization:

ThreadPool.GetMaxThreads (out maxWorkerThreads, out maxCompletionPortThreads);
HttpUtility.MAX_CONNECTIONS_PER_HOST = maxWorkerThreads;
System.Net.ServicePointManager.DefaultConnectionLimit = maxWorkerThreads + 1;

The + 1 is for the connection to the SeleniumRC, due to my guess that the Selenium client code also uses HttpWebRequest. It seems like I'm still running into some kind of deadlock - although the threads waiting for the Selenium lock do not hold any resources.

Any ideas on how to get this working?

A: 

After digging more deeply into this, I figured the issue is not related to connections but to the ThreadPool and HttpWebRequest: At the point in time where the downloaders start to experience timeouts, ThreadPool.GetAvailableThreads() returns 0 or -1 available worker threads. I carefully chose to use HttpWebRequest synchronously to assure this won't happen. Presumably the Selenium client driver uses the asynchronous methods instead, yielding this kind of "thread deadlock".

I am not sure what would be the best way to solve this, but this workaround-replacement for ThreadPool.QueueUserWorkItem() renders the program usable at least:

protected void QueueWorkItem(WaitCallback callBack, object state)
{
    // Wait for available thread (as Selenium's async I/O is mixed with ThreadPool and yields deadlocks)
    int b, c;
    do
    {
        ThreadPool.GetAvailableThreads(out b, out c);
        if (b < 10) Thread.Sleep(250);
    } while (b < 10);
    // Queue the work item
    if (ThreadPool.QueueUserWorkItem(callBack, state)) Interlocked.Increment(ref WorkItemCount);
}
domsom