views:

148

answers:

2

I've read and looked a quite a few examples for Threadpooling but I just cant seem to understand it they way I need to. What I have manage to get working is not really what I need. It just runs the function in its own thread.

public static void Main()
    {
        while (true)
        {
            try
            {
                ThreadPool.QueueUserWorkItem(new WaitCallback(Process));
                Console.WriteLine("ID has been queued for fetching");
            }
            catch (Exception ex)
            {
                Console.WriteLine("Error: " + ex.Message);
            }
            Console.ReadLine();
        }
    }

public static void Process(object state)
{

    var s = StatsFecther("byId", "0"); //returns all player stats
    Console.WriteLine("Account: " + s.nickname);
    Console.WriteLine("ID: " + s.account_id);
    Console.ReadLine();
}

What I'm trying to do is have about 50 threads going (maybe more) that fetch serialized php data containing player stats. Starting from user 0 all the way up to a user ID i specify (300,000). My question is not about how to fetch the stats I know how to get the stats and read them, But how I write a Threadpool that will keep fetching stats till it gets to 300,000th user ID without stepping on the toes of the other threads and saves the stats as it retrieves them to a Database.

A: 

How do you determine the user ID? One option is to segment all the threads so that thread X deals with ID's from 0 - N, and so on, as a fraction of how many threads you have.

Noon Silk
The user ID will just start at 0 and go up after each Fetch.
Fatal510
But can you request a given user by knowing an incrementing ID? If so, do it as I described.
Noon Silk
+2  A: 
static int _globalId = 0;
public static void Process(object state)
{    
  // each queued Process call gets its own player ID to fetch
  processId = InterlockedIncrement(ref _globalId); 
  var s = StatsFecther("byId", processId); //returns all player stats 

  Console.WriteLine("Account: " + s.nickname);    
  Console.WriteLine("ID: " + s.account_id);    
  Console.ReadLine();
}

This is the simplest thing to do. But is far from optimal. You are using synchronous calls, you are relying on the ThreadPool to throttle your call rate, you have no retry policy for failed calls and your application will behave extremly bad under error conditions (when the web calls are failing).

First you should consider using the async methods of WebRequest: BeginGetRequestStream (if you POST and have a request body) and/or BeginGetResponse. These methods scale much better and you'll get a higher troughput for less CPU (if the back end can keep up of course).

Second you should consider self-throthling. On a similar project I used a pending request count. On success, each call would submit 2 more calls, capped with the throtling count. On failure the call would not submit anything. If no calls are pending, a timer based retry submits a new call every minute. This way you only attempt once per minute when the service is down, saving your own resources from spinning w/o traction, and you increase the throughput back up to the throtling cap when the service is up.

You should also know that the .Net framework will limit the number of concurent conncetions it makes to any resource. You must find your destination ServicePoint and change the ConnectionLimit from its default value (2) to the max value you are willing to throttle on.

About the database update part, there are way to many variables at play and way too little information to give any meaningfull advice. Some general advice would be use asynchronous methods in the database call also, size yoru conneciton pool to allow for your throtling cap, make sure your updates use the player ID as a key so you don't deadlock on updating the same record from different threads.

Remus Rusanu
btw, if all you do is read a page from web, parse it and then write to the database then there should be no synchronization required because there is *nothing* shared.
Remus Rusanu
All of Remus' suggestions are good stuff. I had a similar challenge where I needed to dynamically throttle the number of HTTP connections to a server, and what worked best for me was to build a Queue of pending threads and then peel them off of the queue as slots became available. If one failed then it was inserted back into the queue. Each thread called a callback method when it finished, which I used to trigger the next thread(s) or requeue an errant thread.
ebpower
you can set maxconnection in your app.config to get around the default limits.
ebpower