views:

96

answers:

4

Let's say that what I want to do is to validate one million strings, and each validation takes a couple of seconds.

My approach:

I have an array of threads declared like this:

Thread[] workers = new Thread[50];

I don't have all strings in an array, they are got through some calculations, then I don't have all of them when I start the process, but I have a method that returns the next one:

public string next()
{
  //my code
}

I've been able to run all the 50 threads like this:

for (int x = 0; x < 50; x++)
{
workers[x] = new Thread(new ParameterizedThreadStart(myMethod));
workers[x].Start(next());
}

Which swiftly starts all 50 threads "at the same time" and then my log (fed by myMethod) gets 50 responses almost at the same time (1~1.5 second)

How do I get every thread that has just completed to run again with the next string taking into account that the Thread class doesn't expose any event or anything similar ?

Note: I have done some performance tests, and I prefer to use the regular Threads rather than BackgroundWorkers.

Using C# in .net 3.5.

+1  A: 

You can use your next() method the same way ADO.NET does or an enumeration does. Keep returning values until it's finished, then return null. Have your threads consume from the method in a while loop until the method returns null, then exit.

To clarify, there's some background work you'll have to do. You'll have to make your next() method thread-safe so you are always returning the next value without duplicates. You'll also have to pass the reference to the object rather than the output of the next() method. The thread-safe part is the only really complicated thing about it, and it just means you have to lock the part of your next() method that:

  • Determines the next string value to use
  • and updates any object state

Once the state is stable, you can release the lock and the next thread can get its string to work on.

Edit: This may still be the way to go, although I like the ThreadPool approach for simplicity. In this case, the code would be something like:

YourStringGenerator generator;
//instatiate generator
for (int x = 0; x < 50; x++) 
{ 
    workers[x] = new Thread(new ParameterizedThreadStart(myMethod)); 
    workers[x].Start(generator); 
}

then

myMethod(YourStringGenerator generator)
{
    String compare;
    while((compare=generator.next())!=null)
    {
        //do comparison, etc.
    }
    return;
}

next() would look something like

String next()
{
    lock(this.index)  //see msdn for info.  Link below.
    {
        //determine next string
        //update index
    }
    //generate or get next string from list and return it
    //or if empty, return null
}

see msdn for info

Kendrick
@MarceloRamires: That's the problem with self-managing this. You would need to have your thread method fetch work itself, which is going to cause synchronization issues to come into play. This is why the ThreadPool is nicer in these situations...
Reed Copsey
No, the while goes inside the thread (i.e. in myMethod()). There's overhead to starting a thread, so there's an optimal number of threads for your processor and problem. Spawning a new thread for each comparison might well be slower than doing it in the main thread.
Kendrick
You shouldn't start the threads in a while, but have a while running on each of your 50 threads, until there is still a string to process, returned by the next() method.
treaschf
+2  A: 

You can not get an event by the threading system. You can wait for a single thread with Thread.Join, but you can not wait for any thread and get the thread that first completes. Your best approach is to put a while-loop in each thread that polls a queue of work items until the queue is empty.

Albin Sunnanbo
+5  A: 

This sounds like you should be using the ThreadPool. You could then just do:

while(MoreWorkIsAvailable)
{
    string nextString = next();
    ThreadPool.QueueUserWorkItem(new WaitCallback(myMethod), nextString);
}

The thread pool would even allow you to put a hard cap on the max number of threads to allow to run at a single time via SetMaxThreads.

Reed Copsey
+1 I've never used the ThreadPool, but if this work's it's the perfect solution to the problem.
Kendrick
Wouldn't this put the one million strings I have in the memory queued ?
MarceloRamires
It would - if you're dealing with a million entries, it might be overkill - in that case, wait handles may be a better option...
Reed Copsey
@Reed Copsey what do you mean ?
MarceloRamires
Reed Copsey
+1  A: 

Add to your thread method to not just process one piece of data but the "next unclaimed" one.

You'll want to have some synchronization around an enumerator's MoveNext, and grab a copy of the reference to Current. No two threads will be able to advance the enumerator and grab the item at the same time. Then once you have your reference, release the synchronization lock and do your validation.

You might also want to look at Microsoft's Px parallel extension for taking advantage of multiple CPUs (cores?). I haven't used it but if your validation is pure and algorithmic (rather than checked against a db), multi-processor involvement is the only way to beat the single-threaded model.

uosɐſ