ansaurus

Question

optimization math computation (multiplication and summing)

Answer 1

+2 A:

The multiplications are much more costly than additions in term of execution time. Also, depending on the processor additions and multiplications will be done in parallel. Ie. it will start the next multiplication while it's doing the addition (see http://en.wikipedia.org/wiki/Out-of-order_execution).

So reducing the number of additions will not help much for the performance.

What you can do is make it easier for the compiler to vectorize your code, or vectorize by yourself. To make it easier for the compiler to vectorize, I would use a regular array of doubles, use subscripts and not pointers.

EDIT: N = 100 might also be a small number to see the difference in execution time. Try a N bigger.

Dirty code but shows the perf improvement. Output:

The speedup you get is ~3x.

#include <vector>
#include <iostream>

using namespace std;

unsigned long long int rdtsc(void)
{
  unsigned long long int x;
  unsigned a, d;

  __asm__ volatile("rdtsc" : "=a" (a), "=d" (d));

  return ((unsigned long long)a) | (((unsigned long long)d) << 32);;
}



double f(std::vector<double>& xs)
{
  double sum2 = 0.;
  double prev = xs[0];

  vector<double>::const_iterator iend = xs.end();
  for (vector<double>::const_iterator i = xs.begin() + 1;
       i != iend; ++i)
    {
      sum2 += (prev - (*i)) * (prev - (*i)); // only 1 - with compiler optimization
      prev = (*i);
    }

  return sum2;
}

double f2(double *xs, int N)
{
  double sum2 = 0;

  for(int i = 0; i < N - 1; i+=1) {
    sum2 += (xs[i+1] - xs[i])*(xs[i+1] - xs[i]);

  }

  return sum2;
}

int main(int argc, char* argv[])
{
  int N = 1000001;
  std::vector<double> xs;
  for(int i=0; i<N; i++) {
    xs.push_back(i);
  }

  unsigned long long int a, b;
  a = rdtsc();
  std::cout << f(xs) << endl;
  b = rdtsc();
  cout << b - a << endl;

  a = rdtsc();
  std::cout << f2(&xs[0], N) << endl;
  b = rdtsc();
  cout << b - a << endl;
}

Kamchatka 2010-05-15 17:00:57

N is fixed by the problem

wiso 2010-05-15 17:28:01

Right. But if you can, it's easier to have N bigger to check speedups.

Kamchatka 2010-05-15 17:31:09

a) why do you use i+=1 instead of ++i?b) the speed up is because you are using plain array instead of std::vector or because you are not using the `prev` variable?

wiso 2010-05-15 17:43:55

a) because I tried loop unrolling and i+=4. it didn't give much more speedup.b) the speedup is because I use plain array mostly (you can try by removing the prev in your example). Also, compilers are doing a best job if you use standard constructs (though it's not a general rule). If this is possible, you should try ICC (Intel C Compiler). There is a free version for personal use and it is really good a loop analysis.

Kamchatka 2010-05-16 04:53:07

with g++ -O2 I don't get your improvement: 1e+06->7422819, 1e+06->7186527

wiso 2010-05-16 19:34:34

interesting. I used g++ without flags which is generally g++ -O2. What architecture are you runnning on ?

Kamchatka 2010-05-17 03:48:41

linux x86_64, gcc 4. I think -O2 is not the default. From man gcc "-O0 Reduce compilation time and make debugging produce the expected results. This is the default."

wiso 2010-05-17 11:29:48

Answer 2

+1 A:

Addition can be free when done as x+=a*b. The compiler should be able to figure that out in the first version, if the architecture supports it.

The math is probably happening in parallel with *i which could be slower.

Do not call xs.end() at every loop iteration unless you expect the return value to change. If the compiler cannot optimize it out it will dwarf the rest of the loop.

drawnonward 2010-05-15 17:07:50

ansaurus

tags:

views:

answers:

optimization math computation (multiplication and summing)

related questions