views:

252

answers:

1

Hi,

I have the following situation:

I create a boost::thread_group instance, then create threads for parallel-processing on some data, then join_all on the threads.

Initially I created the threads for every X elements of data, like so:

// begin = someVector.begin();
// end = someVector.end();
// batchDispatcher = boost::function<void(It, It)>(...);

boost::thread_group     processors;

// create dispatching thread every ASYNCH_PROCESSING_THRESHOLD notifications
while(end - begin > ASYNCH_PROCESSING_THRESHOLD)
{
    NotifItr split = begin + ASYNCH_PROCESSING_THRESHOLD;

    processors.create_thread(boost::bind(batchDispatcher, begin, split));
    begin = split;
}

// create dispatching thread for the remainder
if(begin < end)
{
    processors.create_thread(boost::bind(batchDispatcher, begin, end));
}

// wait for parallel processing to finish
processors.join_all();

but I have a problem with this: When I have lots of data, this code is generating lots of threads (> 40 threads) which keeps the processor busy with thread-switching contexts.

My question is this: Is it possible to call create_thread on the thread_group after the call to join_all.

That is, can I change my code to this?

boost::thread_group     processors;
size_t                  processorThreads = 0; // NEW CODE

// create dispatching thread every ASYNCH_PROCESSING_THRESHOLD notifications
while(end - begin > ASYNCH_PROCESSING_THRESHOLD)
{
    NotifItr split = begin + ASYNCH_PROCESSING_THRESHOLD;

    processors.create_thread(boost::bind(batchDispatcher, begin, split));
    begin = split;

    if(++processorThreads >= MAX_ASYNCH_PROCESSORS) // NEW CODE
    {                               // NEW CODE
        processors.join_all();      // NEW CODE
        processorThreads = 0;       // NEW CODE
    }                               // NEW CODE
}

// ... 

Whoever has experience with this, thanks for any insight.

+1  A: 

Hi,

I believe this is not possible. The solution you want might actually be to implement a producer-consumer or a master-worker (main 'master' thread divides the work in several fixed size tasks, creates pool of 'workers' threads and sends one task to each worker until all tasks are done).

These solutions will demand some synchronization through semaphores but they will equalize well the performance one you can create one thread for each available core in the machine avoiding waste of time on context switches.

Another not-so-good-and-fancy option is to join one thread at a time. You can have a vector with 4 active threads, join one and create another. The problem of this approach is that you may waste processing time if your tasks are heterogeneous.

Edu
utnapistim