Here is an approach I've successfully used in the past:
Implement your "completion" task as a reference counted object. Each worker thread holds a reference to this object while it is doing its work, then releases it when finished. The completion task does its work when the ref count reaches 0.
Example
Note: my C++ is rusty after years of working primarily in C#, so treat the example below as pseudo-code
Completion Task
class MyCompletionTask {
private:
long _refCount;
public:
MyCompletionTask() {
_refCount = 0;
}
public: // Reference counting implementation
// Note ref-counting mechanism must be thread-safe,
// so we use the Interlocked functions.
void AddRef()
{
InterlockedIncrement(&_refCount);
}
void Release()
{
long newCount = InterlockedDecrement(&_refCount);
if (newCount == 0) {
DoCompletionTask();
delete this;
}
}
private:
void DoCompletionTask()
{
// TODO: Do your thing here
}
}
Calling Code
MyCompletionTask *task = new MyCompletionTask();
task->AddRef(); // add a reference for the main thread
for( <loop thru some items >)
{
task->AddRef(); // Add a reference on behalf of the worker
// thread. The worker thread is responsible
// for releasing when it is done.
QueueUserWorkItem(ThreadProc, (PVOID)task, <etc> );
}
task->Release(); // release main thread reference
// Note: this thread can exit. The completion task will run
// on the thread that does the last Release.
Thread Proc
void ThreadProc(void *context) {
MyCompletionTask *task = (MyCompletionTask)context;
// TODO: Do your thing here
task->Release();
}
One thing to keep in mind with this approach is that the thread on which the completion task completes is non-deterministic. It will depend on which worker thread finishes first (or the main thread, if all the worker threads finish before the main thread calls Release)