views:

2099

answers:

5

I was searching Google for a page offering some simple OpenMp algorithms. Probably there is an example to calculate min, max, median, average from a huge data array but I am not capable to find it.

At least I would normally try to divide the array into one chunk for each core and do some boundary calculation afterwards to get the result for the complete array.

I just didn't want to reinvent the wheel.


Additional Remark: I know that there are thousands of examples that work with simple reduction. e.g. Calculating PI.

const int num_steps = 100000; 
double x, sum = 0.0; 
const double step = 1.0/double(num_steps); 
#pragma omp parallel for reduction(+:sum) private(x) 
for (int i=1;i<= num_steps; i++){ 
  x = double(i-0.5)*step; 
  sum += 4.0/(1.0+x*x); 
} 
const double pi = step * sum;

but when these kind of algorithms aren't usable there are almost no examples left for reducing algorithms.

+4  A: 

This page contains some examples (not exactly the ones you asked for, but sorting et al.)

Should get you started.

http://msdn.microsoft.com/en-us/magazine/cc163717.aspx

Suvesh Pratapa
+3  A: 

This are typical reduction problems.

Besides the page pointed by Suvesh, you might have a look at the documentation for the reduction clause.

Steph
+1  A: 

OpenMP doesn't support these reduction operations. Consider Intel Threading Building Blocks' parallel_reduce algorithm, where you can implement arbitrary algorithm.

Here an example. It uses summation of partial results. You may implement any function you want.

#include <stdio.h>
#include <tbb/blocked_range.h>
#include <tbb/parallel_reduce.h>
#include <tbb/task_scheduler_init.h>


///////////////////////////////////////////////////////////////////////////////


class PiCalculation
{
private:
    long num_steps;
    double step;

public:

    // Pi partial value
    double pi;

    // Calculate partial value
    void operator () (const tbb::blocked_range<long> &r) 
    {
     double sum = 0.0;

     long end = r.end();

     for (int i = r.begin(); i != end; i++)
     {
      double x = (i + 0.5) * step;
      sum += 4.0/(1.0 + x * x);
     }

     pi += sum * step;
    }

    // Combine results. Here you can implement any functions
    void join(PiCalculation &p)
    {
     pi += p.pi;
    }

    PiCalculation(PiCalculation &p, tbb::split)
    {
     pi = 0.0;
     num_steps = p.num_steps;
     step = p.step;
    }

    PiCalculation(long steps)
    {
     pi = 0.0;
     num_steps = steps;
     step = 1./(double)num_steps;
    }
};


///////////////////////////////////////////////////////////////////////////////


int main()
{
    tbb::task_scheduler_init init;

    const long steps = 100000000;

    PiCalculation pi(steps);

    tbb::parallel_reduce(tbb::blocked_range<long>(0, steps, 1000000), pi);

    printf ("Pi is %3.20f\n", pi.pi);
}

Please check this link for additional reduction algorithms. http://cache-www.intel.com/cd/00/00/30/11/301132_301132.pdf#page=19 Please look through paragraph 3.3.1. There is an example on finding minimum value in an array.

Vova
This kind of reduction is very easy in OpenMP. And there is the huge advatage that the code does not differ from serial to multithreaded. But it ends up with the simple abilities of reduction. const int num_steps = 100000; double x, sum = 0.0; const double step = 1.0/double(num_steps); #pragma omp parallel for reduction(+:sum) private(x) for (int i=1;i<= num_steps; i++){ x = double(i-0.5)*step; sum += 4.0/(1.0+x*x); } const double pi = step * sum;
Totonga
Dear, Totonga! OpenMP is limited on reduction functions to few arithmetics: +, -, *, /. In TBB you can implement arbitrary reduction function. That's the advantage.
Vova
+2  A: 

OpenMP (at least 2.0) supports reduction for some simple operations, but not for max and min.

In the following example the reduction clause is used to make a sum and a critical section is used to update a shared variable using a thread-local one without conflicts.

#include <iostream>
#include <cmath>

int main()
{
  double sum = 0;
  uint64_t ii;
  uint64_t maxii = 0;
  uint64_t maxii_shared = 0;
#pragma omp parallel shared(maxii_shared) private(ii) firstprivate(maxii)
  {
#pragma omp for reduction(+:sum) nowait
    for(ii=0; ii<10000000000; ++ii)
      {
        sum += std::pow((double)ii, 2.0);
        if(ii > maxii) maxii = ii;
      }
#pragma omp critical 
    {
      if(maxii > maxii_shared) maxii_shared = maxii;
    }
  }
  std::cerr << "Sum: " << sum << " (" << maxii_shared << ")" << std::endl;
}

EDIT: a cleaner implementation:

#include <cmath>
#include <limits>
#include <vector>
#include <iostream>
#include <algorithm>
#include <tr1/random>

// sum the elements of v
double sum(const std::vector<double>& v)
{
  double sum = 0.0;
#pragma omp parallel for reduction(+:sum)
  for(size_t ii=0; ii< v.size(); ++ii)
    {
      sum += v[ii];
    }
  return sum;
}

// extract the minimum of v
double min(const std::vector<double>& v)
{
  double shared_min;
#pragma omp parallel 
  {
    double min = std::numeric_limits<double>::max();
#pragma omp for nowait
    for(size_t ii=0; ii<v.size(); ++ii)
      {
    min = std::min(v[ii], min);
      }
#pragma omp critical 
    {
      shared_min = std::min(shared_min, min);
    }
  }
  return shared_min;
}

// generate a random vector and use sum and min functions.
int main()
{
  using namespace std;
  using namespace std::tr1;

  std::tr1::mt19937 engine(time(0));
  std::tr1::uniform_real<> unigen(-1000.0,1000.0);
  std::tr1::variate_generator<std::tr1::mt19937, 
    std::tr1::uniform_real<> >gen(engine, unigen);

  std::vector<double> random(1000000);
  std::generate(random.begin(), random.end(), gen);

  cout << "Sum: " << sum(random) << " Mean:" << sum(random)/random.size()
       << " Min:" << min(random) << endl;
}
baol
+1  A: 

It is indeed hard to find examples but keep on searching, the examples exist out there

This https://computing.llnl.gov/tutorials/openMP/ general tutorial is a good place to start with openMP

Veltz