views:

169

answers:

2
HttpWebRequest req = (HttpWebRequest)HttpWebRequest.Create(baseurl + url);
req.Timeout = 1000 * 10;
HttpWebResponse response = (HttpWebResponse)req.GetResponse();
Stream str = response.GetResponseStream();
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.Load(str);
response.Close();
string imgurl = doc.DocumentNode.SelectSingleNode("//div[@class='one-page']/a/img[@class='manga-page']").Attributes["src"].Value;
req = (HttpWebRequest)HttpWebRequest.Create(imgurl);
req.Timeout = 1000 * 10;
response = (HttpWebResponse)req.GetResponse();
str = response.GetResponseStream();
Image img = Image.FromStream(str);
response.Close();
return img;

I run this code in a loop (using several threads) to download about 4000 images, and it works brilliantly for the first hundreds but then (at a different point in time for every time I try) it suddenly stops working, and every call to "req.GetResponse()" results in an TimeoutException. I have no idea of why this happen and no idea of what might be wrong or how to deal with it. Any help would be highly appreciated.

The code I use to run this function (it's called GetPage(int) and called as c.GetPage(t)) is as following:

for (int j = 0; j < 2; j++)
{
    BackgroundWorker bw = new BackgroundWorker();
    num[bw] = j;
    bgs.Add(bw);
    bw.DoWork += (object sender, DoWorkEventArgs doargs) =>
    {
        int t = -1;
        lock (lockObjForQueueOperations)
        {
            if (images.Count != 0)
                t = images.Dequeue();
        }
        if(t < 0)
        {
            doargs.Result = false;
            return;
        }
        currently[sender] = t;
        Image img;
        try { img = c.GetPage(t); }
        catch (Exception e)
        {
            lock (lockObjForQueueOperations)
            {
                images.Enqueue(t);
            }
            lock (Console.Title)
            {
                if (num[sender] == 0) Console.ForegroundColor = ConsoleColor.Cyan;
                else if (num[sender] == 1) Console.ForegroundColor = ConsoleColor.Yellow;
                Console.WriteLine("**ERR: Error fetshing page {0}, errormsg: {1}", t, e.Message);
                Console.ForegroundColor = ConsoleColor.White;
            }
            doargs.Result = true;
            Thread.Sleep(1000*2);
            return;
        }
        lock (Console.Title)
        {
            if (num[sender] == 0) Console.ForegroundColor = ConsoleColor.Cyan;
            else if (num[sender] == 1) Console.ForegroundColor = ConsoleColor.Yellow;
            Console.WriteLine("\t\tLoaded page {0} of {1}.", t + 1, c.PagesCount);
            Console.ForegroundColor = ConsoleColor.White;
        }
        string imgpath = Path.Combine(ndir, "Page " + (t + 1) + ".png");
        img.Save(imgpath, System.Drawing.Imaging.ImageFormat.Png);
        img.Dispose();
        doargs.Result = true;
    };
    bw.RunWorkerCompleted += (object sender, RunWorkerCompletedEventArgs runargs) =>
    {
        if ((bool)runargs.Result) bw.RunWorkerAsync();
        else
        {
            finnishedworkers++;
            if (finnishedworkers == 2) restetter.Set();
            bw.Dispose();
        }
    };
    bw.RunWorkerAsync();
}
+2  A: 

The Timeout property in the HttpWebRequest is in milliseconds. Currently setting it to 10,000 is only 10 seconds, and that might not be enough based off of bandwidth and the size of the data being pulled as well as the complexity of the code being run. I say try increasing that first.

Aaron
The resources are loaded within a sec. I set the timeout down because I was tired of waiting for my timeoutexceptions.
Alxandr
Also I tried to run the code without setting the timeout, but it took too long to give me the exceptions (I was just waiting on the exceptions, nothing happened at all...).
Alxandr
+1  A: 

You have a bad design. Instead of creating threads for every request, try calling BeginGetResponse. The framework will handle allocating threads from the threadpool to service your requests.

Add a call to ServicePointManager.SetDefaultConnectionLimit (?) not sure of this to a number like 100.

create a semaphore with a count matching the connection limit count.

In your function that calls BeginGetResponse add a call to semaphore.WaitOne() just before your call get BeginGet...

In your EndGetResponse() handler, call semaphore.Release() to allow the next request to continue.

You are probably exhausting the threadpool with all of your own threads. Monitor your process and see if you can't execute and only use 5-10 threads total, ever. Maybe you could log Thread.Current.ThreadID to see how the SAME thread handles multiple requests.

Done this billions of time. Really.

No Refunds No Returns
I don't think I'm exhausting the threadpool with only 3 threads running at all time? Or does the background-worker create a new thread every time "RunWorkerAsync" is called?
Alxandr