views:

216

answers:

4

I have an application that has many cases. Each case has many multipage tif files. I need to covert the tf files to pdf file. Since there are so many file, I thought I could thread the conversion process. I'm currently limiting the process to ten conversions at a time (i.e ten treads). When one conversion completes, another should start.

This is the current setup I'm using.

private void ConvertFiles()
{
  List<AutoResetEvent> semaphores = new List<AutoResetEvet>();
  foreach(String fileName in filesToConvert)
  {
    String file = fileName;

    if(semaphores.Count >= 10)
    {
      WaitHandle.WaitAny(semaphores.ToArray());
    }


    AutoResetEvent semaphore = new AutoResetEvent(false);
    semaphores.Add(semaphore);

    ThreadPool.QueueUserWorkItem(
      delegate
      { 
        Convert(file);
        semaphore.Set();
        semaphores.Remove(semaphore);
      }, null);
  }

  if(semaphores.Count > 0)
  {
    WaitHandle.WaitAll(semaphores.ToArray());
  }
}

Using this, sometimes results in an exception stating the WaitHandle.WaitAll() or WaitHandle.WaitAny() array parameters must not exceed a length of 65. What am I doing wrong in this approach and how can I correct it?

+1  A: 

Looks like you need to remove the handle the triggered the WaitAny function to proceed

if(semaphores.Count >= 10)
{
  int index = WaitHandle.WaitAny(semaphores.ToArray());
  semaphores.RemoveAt(index);
}

So basically I would remove the:

semaphores.Remove(semaphore);

call from the thread and use the above to remove the signaled event and see if that works.

SwDevMan81
I had thought about doing this as well. However, the documentation for WaitHandle.WaitAny stated that multiple handles could set and that only the lowest index would be returned. How would I know to remove the other handles that may have been set during the call?
JWilliams
@JWilliams - Multiple handles could be set and the lowest index will be returned. The next iteration, if another handle is set, it will return right away with that index. If there is an already signaled event, it returns right away with that index. So this will be ok. The next iteration will immediately remove the handle
SwDevMan81
I ended up performing the remove using this style. I find it much more readable then the previous way I was doing it. Thank you for pointing it out.
JWilliams
Ok great, feel free to accept the answer if this is what you go with :)
SwDevMan81
+1  A: 

Are you using a real semaphore (System.Threading)? When using semaphores, you typically allocate your max resources and it'll block for you automatically (as you add & release). You can go with the WaitAny approach, but I'm getting the feeling that you've chosen the more difficult route.

Pestilence
I am not. I'm actually using the AutoResetEvent with WaitHandle calls. I will read over the actual Semaphore class.
JWilliams
A: 

Maybe you shouldn't create so many events?

// input
var filesToConvert = new List<string>();
Action<string> Convert = Console.WriteLine;

// limit
const int MaxThreadsCount = 10;

var fileConverted = new AutoResetEvent(false);
long threadsCount = 0;

// start
foreach (var file in filesToConvert) {
    if (threadsCount++ > MaxThreadsCount) // reached max threads count 
        fileConverted.WaitOne();          // wait for one of started threads

    Interlocked.Increment(ref threadsCount);

    ThreadPool.QueueUserWorkItem(
        delegate {
            Convert(file);

            Interlocked.Decrement(ref threadsCount);
            fileConverted.Set();
        });
}

// wait
while (Interlocked.Read(ref threadsCount) > 0) // paranoia?
    fileConverted.WaitOne();
SpeCT
+1  A: 

There are a few problems with what you have written.

1st, it isn't thread safe. You have multiple threads adding, removing and waiting on the array of AutoResetEvents. The individual elements of the List can be accessed on separate threads, but anything that adds, removes, or checks all elements (like the WaitAny call), need to do so inside of a lock.

2nd, there is no guarantee that your code will only process 10 files at a time. The code between when the size of the List is checked, and the point where a new item is added is open for multiple threads to get through.

3rd, there is potential for the threads started in the QueueUserWorkItem to convert the same file. Without capturing the fileName inside the loop, the thread that converts the file will use whatever value is in fileName when it executes, NOT whatever was in fileName when you called QueueUserWorkItem.

This codeproject article should point you in the right direction for what you are trying to do: http://www.codeproject.com/KB/threads/SchedulingEngine.aspx

EDIT:

var semaphores = new List<AutoResetEvent>();
        foreach (String fileName in filesToConvert)
        {
            String file = fileName;
            AutoResetEvent[] array;
            lock (semaphores)
            {
                array = semaphores.ToArray();
            }
            if (array.Count() >= 10)
            {
                WaitHandle.WaitAny(array);
            }

            var semaphore = new AutoResetEvent(false);
            lock (semaphores)
            {
                semaphores.Add(semaphore);
            }
            ThreadPool.QueueUserWorkItem(
              delegate
              {
                  Convert(file);
                  lock (semaphores)
                  {
                      semaphores.Remove(semaphore);
                  }
                  semaphore.Set();
              }, null);
        }

Personally, I don't think I'd do it this way...but, working with the code you have, this should work.

Eric Dahlvang
1) When I wrapped all calls to the semaphores list into lock statements, a dead lock occurred and not one file was converted. I used the semaphores object as the locking object.2) While the locking of the list not fix this?3) I forgot a line that copies the fileName value into another string. You may have noticed that in the QueueUserWorkItem the variable containing the file name was actually file instead of fileName. That wasn't a typo, atleast not when the forgotten line was added back it. That resolves this problem, correct?
JWilliams
Yes, your change fixes the captured variable issue. I've edited my answer to include a locking strategy that makes the code thread safe.
Eric Dahlvang
I;m not sure what caused my deadlock using the semaphores list, but once I changed the lock object to a standalone static object, this seems to have worked. I was so close to this before I posted the question. Also, you said you wouldn't do it this way. Is there a better way?
JWilliams