views:

252

answers:

2

I have a threaded console application that is working fine, but it's architecture needs to be improved, and I'd like some feedback.

Currently, the program loads up a list of data, and segments that data into partitions (one chunk for each thread). The program then initializes a new thread using the ThreadPool, and passes it ONE segment of the partitioned data on which to operate.

Everything works nicely...except:

Some of the threads fail...due to network problems, or unrecoverable exceptions. This is expected behavior and not a bug.

I now need a way (if the thread fails) to recover that thread's segment of data and provide it to another working thread so that it doesn't become orphaned. I'm sure there are ways to do this, ie, sharing data between threads, etc, but I think there's a better approach.

Instead of segmenting the data beforehand and passing it to each thread, I could share ONE static collection of this data between all threads. This is more elegant, but introduces newfound sync problems that the old approach didn't have to worry about.

A.) What are your thoughts on this approach vs the old one?
B.) If this approach is a good one, how do I go about locking access to the shared static collection.

When the thread inits, I can lock the collection and pop-off a segment of the data just for that thread. The static collection would now be REDUCED by the amount popped off for that thread. Upon FAILURE of the thread, I could reallocate that segment of data to the shared collection by again locking it, and pushing the data back to the collection for other threads to attempt to process.

For example: (untested pseudocode)

void Process(object threadInfo)
{
  lock(StaticCollection)
  {
    var segment = StaticCollection.Take(100);
    StaticCollection.Remove(StaticCollection.Where(item => segment.Contains(item)))
  }

  foreach(var seg in segment)
  {
    // do something
  }

  // reallocate the thread's data on failure
  if(unrecoverableErrorOccurred)
  {
    lock(StaticCollection)
    {
      StaticCollection.Add(segment);
    }
  }
}

Am I on the right track with this? It seems to me that one thread can remove items at the same time another thread is reallocating items...or does a lock on a STATIC collection mean that no other thread can access that collection at all. So, Thread A.) obtained a lock in the FIRST part of the method, would that block all other threads from executing the LAST part of the method until ThreadA was complete?

+3  A: 

Let's separate out a few things here...

Firstly, you're not actually locking the collection. You're locking the monitor associated with an object. I personally think it was a mistake that .NET followed Java in giving each object an associated monitor to lock on, but let's leave that to one side. Personally I prefer to have objects and associated variables purely for locking - so in my code you may see:

private readonly object padlock = new object();

This ensures that no other code is going to try to acquire that lock, because they won't know about the object.

Secondly, locks are advisory. This is part of the business of "you're not locking the collection." If the collection itself synchronizes on the same lock - and the non-generic collections have a Synchronized method for this very purpose - but basically unless something somewhere explicitly takes out a lock, you won't get synchronization.

Thirdly, yes, the two locked blocks shown in your code are using the same lock (assuming the value of StaticCollection doesn't change, of course). If one thread is busy calling Remove, that will stop any other thread from calling Add at the same time, because they each need to have the lock. That's probably what you want.

I personally wouldn't make it a genuinely static collection though (or rather, I wouldn't use a StaticCollection variable). I would give each task a reference to the same collection (and a reference to the associated lock; in fact I'd probably encapsulate the collection, the synchronization and the "get me a bunch of work" and "here's a bunch of work to have back" bits in a separate class). That will make it simpler to test and generally nicer logically. It also means you could have two separate "sets" of threads working on different collections at the same time... which could be useful if you make the above encapsulation generic, so they could be performing radically different tasks...

Jon Skeet
Thanks a lot for the insights.
Scott
A: 

You might consider using a Queue to hold unprocessed chunks, and as Jon Skeet says, lock on a neutral object, and hold the lock only long enough to access the Queue. I've used this approach with many threads and it worked well for me.

ebpower