views:

161

answers:

5

My program has a list of 200k files. I have to import each to the database. I takes a long time so I started researching about multithreads as a means to speed up the importing process. I finally got to an implementation but I'm not sure it's actually working.

After using http://stackoverflow.com/questions/2702545/workaround-for-the-waithandle-waitall-64-handle-limit as a sample for my c# code I've came up with:

 int threadCount = 0;       

 for (int i = 0; i < this.Total; i++)
 {
       Finished = new ManualResetEvent(false);
       threadCount = this.ThreadCount;
       Interlocked.Increment(ref threadCount);

       FileHandler fh = new FileHandler(finished, sorted[i], this.PicturesFeatures, this.Outcome, this.SiteIds, this.LastId, this.Order, this.ThreadCount);
       Console.Write(i + " ");
       ThreadPool.QueueUserWorkItem(new WaitCallback(HandleFile), fh);
       Console.Write(i + " ");
       Finished.WaitOne();
 }

And HandleFile() goes as:

 private void HandleFile(object s)
    {           
        try
        {
            //code        
        }
        finally
        {
            if (Interlocked.Decrement(ref threadCount) == 0)
            {
                Finished.Set();
            }
        }
    }

I've put those console.Write thinking that if a process is longer it would finish later than some other ("0 0 1 2 2 1 3 3 ..."), but it's always in order ("0 0 1 1 2 2 3 3 4 4 ...")

+4  A: 

Your output is to be expected. You're writing the output in the main thread, the QueueUserWorkItem function does not block, it registers your HandleFile function to be executed in a separate thread. So regardless of how long the work items take, your prints will happen in the expected order as they are all from the main thread.

Additionally, you're not getting the benefit of parallelism with this code because you're waiting after every item you submit. You're essentially saying I won't submit my next work item until the last one is finished. This is just a more complicated way of writing normal serialized code. In order to introduce parallelism, you need to add multiple items to the queue without waiting in between.

bshields
+1  A: 

First off, threads spawned from a multithreaded application are NOT guaranteed to finish in any particular order. You may have started one thread first, but it may not necessarily finish first.

WIth that said, you can use Process Explorer: http://technet.microsoft.com/en-us/sysinternals/bb896653.aspx

Process Explorer will show you which threads your program is spawning.

icemanind
+2  A: 

Break into the execution while it's running (ctrl + alt + break) and then take a look at the threads window. (Debug -> Windows -> Threads).

Phil
Ok. I have the Main Thread, 5 Worker Threads with <No Name>, 1 worker thread called .NET SystemEvents, and another worker thread called vshost.RunParkingWindow. I compared with some Hello World program, doesn't seem to have much multithreading going on.
EduardoMello
How quickly do your threads execute? Try putting a break point in where you'll be pretty sure that you have multiple threads running concurrently. Also you can double click on items in the threads window and it will jump to the point of execution for that thread.
Phil
+2  A: 

You have a couple of problems.

  • The work items are going to effectively serialized since you are waiting for each one to complete before starting the next.
  • The Console.WriteLine calls are on the main thread so it is natural for them to report i as incrementing in order.

Here is the canonical pattern for doing this correctly.

int count = TOTAL_ITERATIONS;
var finished = new ManualResetEvent(false);
for (int i = 0; i < TOTAL_ITERATIONS; i++) 
{ 
  int captured = i; // Use this for variable capturing in the anonymous method.
  ThreadPool.QueueUserWorkItem(
    delegate(object state)
    {
      try
      {
        Console.WriteLine(captured.ToString());
        // Your task goes here.
        // Refer to 'captured' instead of 'i' if you need the loop variable.
        Console.WriteLine(captured.ToString());
      }
      finally
      {
        if (Interlocked.Decrement(ref count) == 0)
        {
          finished.Set();
        }
      }
    });
}
finished.WaitOne();

Edit: To easily demonstrate that multiple threads are invoked use the following code.

public static void Main()
{
    const int WORK_ITEMS = 100;
    int count = WORK_ITEMS;
    var finished = new ManualResetEvent(false);
    Console.WriteLine(Thread.CurrentThread.ManagedThreadId.ToString() + ":Begin queuing...");
    for (int i = 0; i < WORK_ITEMS; i++)
    {
        int captured = i; // Use this for variable capturing in the anonymous method. 
        ThreadPool.QueueUserWorkItem(
          delegate(object state)
          {
              try
              {
                  Console.WriteLine(Thread.CurrentThread.ManagedThreadId.ToString() + ":" + captured.ToString());
                  for (int j = 0; j < 100; j++) Thread.Sleep(1);
                  Console.WriteLine(Thread.CurrentThread.ManagedThreadId.ToString() + ":" + captured.ToString());
              }
              finally
              {
                  if (Interlocked.Decrement(ref count) == 0)
                  {
                      finished.Set();
                  }
              }
          });
    }
    Console.WriteLine(Thread.CurrentThread.ManagedThreadId.ToString() + ":...end queueing");
    finished.WaitOne();
    Console.ReadLine();
}
Brian Gideon
I've made the changes as you propose. Then I used the Thread Window (as pointed by Phil on the answer below) and, as I told Phil, doesn't seem to have much effect.
EduardoMello
@EduardoMello: It may be because the `ThreadPool` is choosing to execute the work items serially on only one thread from the pool. I edited my answer to include code that coerces it to assign the work items to different threads and demonstrates that it is indeed doing so by writing the thread id to the console.
Brian Gideon
+1  A: 

The information that you're outputting is all coming from the same thread (the one running your loop). If you want to see evidence of multiple threads, you can output the thread name or some other value from your HandleFile function.

msergeant