views:

670

answers:

10

Hi,

I have a number crunching application written in C. It is kind of a main loop that for each value calls, for increasing values of "i", a function that performs some calculations. I read about multithreading, and I am considering learning a bit about it, in C. I wonder if somehow general code like mine could be automatically multithreaded and how.

Thanks

P.D. To get an idea about my code, let's say that it is something like this:

main(...)
for(i=0;i<=ntimes;i++)get_result(x[i],y[i],result[i]);

...

void get_result(float x,float y,float result){
  result=sqrt(log (x) + log (y) + cos (exp (x + y));
(and some more similar mathematical operations)
}
+20  A: 

If the task is highly parallelizable and your compiler is modern, you could try OpenMP. http://en.wikipedia.org/wiki/OpenMP

Novikov
+1 Beat me to it. OpenMP is perfect for the kind of stuff that the OP describes.
casablanca
yes. what I wonder if whether there will be an overhead when using multithread instead of just doing the "brute force" main loop; I mean if it is usually demonstrated that in cases like this, multithread outperforms serial calculations, even if there must be some kind of control for the threads
Werner
@Werner: That depends largely on the problem you are solving and the implementation of the algorithms. The more things that can be done completely independent from the rest, the better it lends itself to parallel execution.
Bart van Ingen Schenau
If `ntimes` is large then parallel should be faster, the bulk of the code that runs can be the same in both cases (by passing off large chunks rather than individual cases). OTOH I'd expect that something like OpenMP would have that all figured out.
BCS
+3  A: 

Depending on the OS, you could use posix threads. You could instead implement stack-less multithreading using state machines. There is a really good book entitled "embedded multitasking" by Keith E. Curtis. It's just a neatly crafted set of switch case statements. Works great, I've used it on everything from apple macs, rabbit semiconductor, AVR, PC.

Vali

ValiRossi
What would be the point of "multithreading" (co-states are not threads) a compute-bound process in a model where you could _NEVER_ have true concurrency?
San Jacinto
True state machines do not use context switches and are not "real" threads. The originator did not state what OS or platform at the time I commented. If it is a simple matter of program responsiveness, state machines may be a good solution. Certainly easier to debug. Blocking is an issue tho.
ValiRossi
+6  A: 

If you are hoping to provide concurrency for a single loop for some kind of scientific computing or similar, OpenMP as @Novikov says really is your best bet; this is what it was designed for.

If you're looking to learn the more classical approach that you would more typically see in an application written in C... On POSIX you want pthread_create() et al. I'm not sure what your background might be with concurrency in other languages, but before going too deeply into that, you will want to know your synchronization primitives (mutexes, semaphores, etc.) fairly well, as well as understanding when you will need to use them. That topic could be a whole book or set of SO questions unto itself.

asveikau
+5  A: 

You should have a look at openMP for this. The C/C++ example on this page is similar to your code: https://computing.llnl.gov/tutorials/openMP/#SECTIONS

#include <omp.h>
#define N     1000

main ()
{

int i;
float a[N], b[N], c[N], d[N];

/* Some initializations */
for (i=0; i < N; i++) {
  a[i] = i * 1.5;
  b[i] = i + 22.35;
  }

#pragma omp parallel shared(a,b,c,d) private(i)
  {

  #pragma omp sections nowait
    {

    #pragma omp section
    for (i=0; i < N; i++)
      c[i] = a[i] + b[i];

    #pragma omp section
    for (i=0; i < N; i++)
      d[i] = a[i] * b[i];

    }  /* end of sections */

  }  /* end of parallel section */

}

If you prefer not to use openMP you could use either pthreads or clone/wait directly.

No matter which route you choose you are just dividing up your arrays into chunks which each thread will process. If all of your processing is purely computational (as suggested by your example function) then you should do well to have only as many threads as you have logical processors.

There is some overhead with adding threads to do parallel processing, so make sure that you give each thread enough work to make up for it. Usually you will, but if each thread only ends up with 1 computation to do and the computations aren't that difficult to do then you may actually slow things down. You can always have fewer threads than you have processors if that is the case.

If you do have some IO going on in your work then you may find that having more threads than processors is a win because while one thread may be blocking waiting for some IO to complete another thread can be doing its computations. You have to be careful doing IO to the same file in threads, though.

nategoose
+1  A: 

Hi,
a good exercise for learning concurrent programming in any language would be to work on a thread pool implementation.
In this pattern you create some threads in advance. Those threads are treated as an resource. A thread pool object/structure is used to assign user defined task to those threads for execution. When the task is finished you can collect it's results. You can use thread pool as a general purpose design pattern for concurrency. The main idea could look similar to

#define number_of_threads_to_be_created 42
// create some user defined tasks
Tasks_list_t* task_list_elem = CreateTasks();
// Create the thread pool with 42 tasks
Thpool_handle_t* pool = Create_pool(number_of_threads_to_be_created);

// populate the thread pool with tasks
for ( ; task_list_elem; task_list_elem = task_list_elem->next) {
   add_a_task_to_thpool (task_list_elem, pool);
}
// kick start the thread pool
thpool_run (pool);

// Now decide on the mechanism for collecting the results from tasks list.
// Some of the candidates are:
// 1. sleep till all is done (naive)
// 2. pool the tasks in the list for some state variable describing that the task has
//    finished. This can work quite well in some situations
// 3. Implement signal/callback mechanism that a task can use to signal that it has 
//    finished executing.

The mechanism for collecting data from tasks and the amount of threads used in pool should be chosen to reflect your requirements and the capabilities of the hardware and runtime environment.
Also please note that this pattern does not say anything how you should "synchronize" your tasks with each other/outside surroundings. Also error handling can be a bit tricky (example: what to do when one task fails). Those two aspects need to be thought in advance - they can restrict usage of thread pool pattern.

About thread pool:
http://en.wikipedia.org/wiki/Thread_pool_pattern
http://docs.sun.com/app/docs/doc/816-5137/ggedn?a=view
A good literature about pthreads to get going:
http://www.google.com/url?sa=t&amp;source=web&amp;cd=4&amp;sqi=2&amp;ved=0CCUQFjAD&amp;url=http%3A%2F%2Fwww.advancedlinuxprogramming.com%2Falp-folder%2Falp-ch04-threads.pdf&amp;rct=j&amp;q=pthreads%20pdf&amp;ei=-XW3TKrHHIvoOaz5nZsJ&amp;usg=AFQjCNF4OjtLiVaqfvOK0hLwLrWmnhXBpg&amp;cad=rja

Marcin
+5  A: 

One alternative to multithread your code would be using pthreads ( provides more precise control than OpenMP ).

Assuming x, y & result are global variable arrays,

#include <pthread.h>

...

void *get_result(void *param)  // param is a dummy pointer
{
...
}

int main()
{
...
pthread_t *tid = malloc( ntimes * sizeof(pthread_t) );

for( i=0; i<ntimes; i++ ) 
    pthread_create( &tid[i], NULL, get_result, NULL );

... // do some tasks unrelated to result    

for( i=0; i<ntimes; i++ ) 
    pthread_join( tid[i], NULL );
...
}

(Compile your code with gcc prog.c -lpthread)

Kedar Soparkar
A: 

To specifically address the "automatically multithreaded" part of the OP's question:

One really interesting view of how to program parallelism was designed into a language called Cilk Plus invented by MIT and now owned by Intel. To quote Wikipedia, the idea is that

"the programmer should be responsible for exposing the parallelism, identifying elements that can safely be executed in parallel; it should then be left to the run-time environment, particularly the scheduler, to decide during execution how to actually divide the work between processors."

Cilk Plus is a superset of standard C++. It just contains a few extra keywords (_Cilk_spawn, _Cilk_sync, and _Cilk_for) that allow the programmer to tag parts of their program as parallelizable. The programmer does not mandate that any code be run on a new thread, they just allow the lightweight runtime scheduler to spawn a new thread if and only if it is actually the right thing to do under the particular runtime conditions.

To use Cilk Plus, just add its keywords into your code, and build with Intel's C++ compiler.

Jon Rodriguez
+1  A: 

Your code is not automatically multi-threaded by the compiler if that was your question. Please note that the C standards themselves know nothing about multi-threading, since whether you can use multi-threading or not does not depend on the language you use for coding, but on the destination platform you are coding for. Code written in C can run on pretty much anything for that a C compiler exists for. A C compiler even exists for a C64 computer (almost completely ISO-99 conform); however, to support multiple threads, the platform must have an operating system supporting this and usually this means that at least certain CPU functionality must be present. An operating system can do multithreading almost exclusively in software, this will be awfully slow and there won't be memory protection, but it is possible, however even in that case you need at least programmable interrupts.

So how to write multi-threaded C code depends entirely on the operating system of your target platform. There exists POSIX conform systems (OS X, FreeBSD, Linux, etc.) and systems that have their own library for that (Windows). Some systems have more than library for it (e.g. OS X has the POSIX Library, but there is also the Carbon Thread Manager you can use in C (though I think it is rather legacy nowadays).

Of course there exists cross-platform thread libraries and some modern compilers have support for things like OpenMP, where the compiler will automatically build code to create threads on your chosen target platform; but not many compilers do support it and those that do support it are usually not feature complete. Usually you get the widest system support by using POSIX threads, more often called "pthreads". The only major platform not supporting it is Windows and here you can use free 3rd party libraries like this one. Several other ports exists as well (Cygwin has one for sure). If you will have a UI one day of some kind, you may want to use a cross-platform library like wxWidgets or SDL, both offering consistent multi-thread support on all supported platforms.

Mecki
+1  A: 

If an iteration in loop is independent of the ones before it, then there's a very simple approach: try multi-processing, rather than multi-threading.

Say you have 2 cores and ntimes is 100, then 100/2=50, so create 2 versions of the program where the first iterates from 0 to 49, the other from 50 to 99. Run them both, your cores should be kept quite busy.

This is a very simplistic approach, yet you don't have to mess with thread creation, synchronization, etc

ifyes
The two processes need to run for at least a minute or two to makeup for the cost of starting a new process via fork(). That said I do use fork() for batch processing when each concurrent task is long lived.
Michael Shopsin
+1  A: 

Intel's C++ compiler is actually capable of automatically paralellizing your code. It's just a compiler switch you need to enable. It doesn't work as well as OpenMP though (ie. it doesn't always succeed or resulting program is slower). From Intel's website: "Auto-parallelization, which is triggered by the -parallel (Linux* OS and Mac OS* X) or /Qparallel (Windows* OS) option, automatically identifies those loop structures that contain parallelism. During compilation, the compiler automatically attempts to deconstruct the code sequences into separate threads for parallel processing. No other effort by the programmer is needed."

Cornelius Scarabeus