ansaurus

Question

Answer 1

A:

You want memory accesses to be adjacent. In your case simply swap I and j when accessing arr.

Yann Ramin 2010-05-01 02:42:18

That doesn't make sense. I'll segfault. I agree that I want to access sequential memory, but I don't think that is the way to do it.

Hristo 2010-05-01 02:43:32

@hristo: swap the dimensions of the array too, so swapped (transposed) i and j are valid

Potatoswatter 2010-05-01 02:47:02

Answer 2

+2 A:

A general in-place matrix transposition is very difficult, but if you're okay with transposing it to another array, then it's pretty simple.

const int cols = 500; 
const int rows = 100; 

int arr[rows][cols];
// fill arr[][]

int arrT[cols][rows];
for (int r = 0; r < rows; r++) {
   for (int c = 0; c < cols; c++) {
      arrT[c][r] = arr[r][c];
   }
}

Of course, depending on how you're getting arr[][], you can just fill arrT[][] directly instead.

However, there may be a simpler solution of simple swapping the order of the loops.

for(int k = 0; k < T; ++k) { // for each trainee
  myscore[k] = 0;
  for(int j = 0; j < rows; ++j) { // for each expert
    for(int i = 0; i < cols; ++i) { // for each sample  
      myscore[k] += delta(i, anotherArray[k][i], arr[j][i]);
    }   
  }
}

polygenelubricants 2010-05-01 02:45:07

I tried swapping the j and i loops, and unfortunately it runs slower than the other way around, which doesn't make sense.

Hristo 2010-05-01 02:52:18

Answer 3

+1 A:

  for(int i = 0; i < N; ++i) { // for each sample  
    for(int j = 0; j < E[i]; ++j) { // for each expert
      ... arr[j][i] ... // each ++j causes a large stride => poor caching
    }   
  }

transpose the loops:

  for(int j = 0; j < E[i]; ++j) { // for each expert
    for(int i = 0; i < N; ++i) { // for each sample  
      ... arr[j][i] ... // each ++i looks to the next word in memory => good
    }   
  }

Of course, without seeing everything else in the program, I can't say if that would cause a problem. If delta doesn't have side effects, you should be fine.

Potatoswatter 2010-05-01 02:50:32

I did this already and it runs slower, which doesn't help. Something I've been hiding is that this runs using Intel's TBB and is parallelized for multiple cores. It should run faster b/c it is more cache friendly, but it isn't :/

Hristo 2010-05-01 02:53:51

@hristo: it sounds like you completely misled us as to what the program does and what constrains its performance. Please don't hide things like that in the future.

Potatoswatter 2010-05-01 03:23:55

Answer 4

+2 A:

Yes, 1d should be faster than 2d. C and C++ arrays are always 1d (internally). When you call something like

array[row][col]

the compiler actually calculates

col + row * maxcols

and uses that as the actual index of a 1d array. You might as well do that yourself. Cycling through an entire array will be way faster, and random access will be equally fast as in a 2d array.

mingos 2010-05-01 02:53:31

Can you please explain what you mean by "Cycling through an entire array will be way faster, and random access will be equally fast as in a 2d array." I'm not quite following you.

Hristo 2010-05-01 02:54:50

Cycling through a 1d array requires no extra calculations, just increment the iterator value and use it as index. Random access wil require the "col+row*maxcols" index value to be calculated.

mingos 2010-05-01 10:16:51

ansaurus

tags:

views:

answers:

optimize 2D array in C++

related questions