views:

547

answers:

7

I have some work items pooled using the legacy QueueUserWorkItem function (I need to support OSs earlier than vista, so

for( <loop thru some items >)
{
    QueueUserWorkItem( )
}

I need to wait on these before proceeding to the next step. I've seen several similar answers...but they are in .NET. I'm thinking of using a storing an Event for each item and waiting on them(gasp!), but are there other better, lightweight ways? (no kernel locks)

Clarification: I know of using Events. I'm interested in solutions that don't require a kernel-level lock.

+1  A: 
  1. Sleep(N) (by the way .Net spinlock implemented on Sleep(0))
  2. WaitForSingleObject with or without timeout using together with Event or other sync -objects
Dewfy
+2  A: 

AFAIK the only ways you can do this is to have a counter that is InterlockIncrement'ed as each task finishes.

You can then either do a

while( counter < total )
    Sleep( 0 );

of the task can signal an event (or other sync object) and you can do the following

while( count < total )
    WaitForSingleObject( hEvent, INFINITE );

The second method will mean that the main thread uses less processing time.

Edit: TBH the only way to avoid a kernel lock is to spin lock and that will mean you'll have one core wasting time that could otherwise be used to process your work queue (or indeed anything else). If you REALLY must avoid a kernel lock then spin lock with a Sleep( 0 ). However I'd definitely recommend just using a kernel lock, get the extra CPU time back for doing worthwhile processing and stop worrying about a 'non' problem.

Goz
Busy waiting is bad.
jeffamaphone
Both Sleep(0) and WaitForSingleObject(INFINITE) will cause problems on an UI or STA thread. Both calls block the message loop preventing the window and COM message procs thus causing unresponsive UI and COM objects. The pattern in this answer is the worst possible pattern for waiting on multiple threads.
Franci Penov
+1  A: 

You could have a counter that each task increments atomically as it finishes (as previously mentioned) but then also have it check if it was the last task (counter == total) and set a single event if so. The main thread then just needs to WaitForSingleObject(). Just make sure the check is done atomically too.

// on task finished
Lock();
count++;
bool done = count == total;
Unlock();
if ( done )
    SetEvent( hEvent );
Chadwick
+1  A: 

I think using events is pretty much your best bet (and the most lightweight). The only thing I would add is that you can simplify your code when it comes to waiting for your work items to finish by using:

HANDLE WorkEvents[ 5 ]; // store you event handles here

...

WaitForMultipleObjects(5, WorkEvents, TRUE, INFINITE);

This will wait for all events to complete in one system call.


Edit: An alternative, if you don't mind spinning over your work items is to call GetExitCodeThread on each of the threads, checking for an exit status.

Alan
I can do a GetExitCodeThread on thread pools?
moogs
WaitForMultipleObjects is the right way to do this.
jeffamaphone
+2  A: 

This is possible w/o any wait if you actually split the execution stack: up to the for loop you execute on the calling thread, after the for loop you finish on the last queue thread:

 CallerFunction(...)
 {
   sharedCounter = <numberofitems>;
   for (<loop>)
   {
     QueueUserWorkItem(WorkerFunction, ...);
   }
   exit thread; (return, no wait logic)
 }

WorkerFunction(, ...) 
{
  // execute queued item here
  counter = InterlockeDdecrement(sharedCounter);
  if (0 == counter)
  {
    // this is the last thread, all others are done
   continue with the original logic from  CallerFunction here
  }
}

Whether this will work or not depends on many factors and I can't say w/o knowing more about the caller context, if is possible to suspend its execution on the calling thread and resume it on a queued thread. Btw by 'exit thread' I do not mean an abrupt thread abort, but a gracious return, and the entire calling stack is prepared to hand over execution context to the queue thread. Not a trivial task, I reckon.

Remus Rusanu
+1: We use a very similar pattern, except that we use single kernel event to signal that all workers have finished.
Filip Navara
@Filip: yes, the last thread signals the event and the caller wakes and resumes. What I proposed above is fully event free, as the OP asked, at the cost of the complexity of 'switching' the execution context. In truth this 'switch' is nothing else but a disguised asynchronous callback invocation so is not really that hard.
Remus Rusanu
i would chosen this had SO not auto-selected the answer. :(
moogs
No worries moogs
Remus Rusanu
+1  A: 

Here is an approach I've successfully used in the past:

Implement your "completion" task as a reference counted object. Each worker thread holds a reference to this object while it is doing its work, then releases it when finished. The completion task does its work when the ref count reaches 0.

Example

Note: my C++ is rusty after years of working primarily in C#, so treat the example below as pseudo-code

Completion Task

class MyCompletionTask {

private:

    long _refCount;

public:

    MyCompletionTask() {
        _refCount = 0;
    }

public: // Reference counting implementation

   // Note ref-counting mechanism must be thread-safe,
   // so we use the Interlocked functions.

    void AddRef()
    {
        InterlockedIncrement(&_refCount);
    }

    void Release()
    {
        long newCount = InterlockedDecrement(&_refCount);

        if (newCount == 0) {
             DoCompletionTask();
             delete this;
        }
    }

private:

    void DoCompletionTask()
    {
        // TODO: Do your thing here
    }
}

Calling Code

MyCompletionTask *task = new MyCompletionTask();

task->AddRef(); // add a reference for the main thread

for( <loop thru some items >)
{
    task->AddRef(); // Add a reference on behalf of the worker
                    // thread.  The worker thread is responsible
                    // for releasing when it is done.

    QueueUserWorkItem(ThreadProc, (PVOID)task, <etc> );
}

task->Release(); // release main thread reference

// Note: this thread can exit.  The completion task will run
// on the thread that does the last Release.

Thread Proc

void ThreadProc(void *context) {

    MyCompletionTask *task = (MyCompletionTask)context;

    //  TODO: Do your thing here

    task->Release();
}

One thing to keep in mind with this approach is that the thread on which the completion task completes is non-deterministic. It will depend on which worker thread finishes first (or the main thread, if all the worker threads finish before the main thread calls Release)

DSO
but won't this give me the same problem? i'm stuck with waiting in the calling thread for the completion task to finish. sorry if i don't get it.
moogs
With this approach, you don't do a wait in the main thread. In the completion task, you put the code that you want to run after all the worker threads have finished, and it will be invoked when the last Release is called.
DSO
+1  A: 

You have several alternatives:

  • Use one Semaphore instead of multiple Event instances - it supports counted locks/waits.
  • Use multiple Critical Section instances - they are lighter than Event.
  • Use InterlockedDecrement from worker threads and spin a Wait... (with short timeout) loop on the waiting thread handle. You'll have to tweak the timeout value to your taste. Also, make sure you are using the proper Wait... method that spins a message loop if your main thread is an STA. And be aware that in certain scenarios your wait can be interrupted for an APC event as well.
  • Use any of the techniques in the other answers to transfer the completion task on the last queued worker thread. note that this is useful only if the completion task is not bound to the main thread.
Franci Penov
Semaphore will not work in this situation because they are signalled when the count>0. He wants to wait until all the threads have finished (count==0).
DSO
@DSO - really? and he can't do a reverse count? :-)
Franci Penov