views:

181

answers:

2

How would I unroll the following nested loops?

for(k = begin; k != end; ++k) {
 for(j = 0; j < Emax; ++j) {
  for(i = 0; i < N; ++i) { 
   if (j >= E[i]) continue; 
   array[k] += foo(i, tr[k][i], ex[j][i]);
  }
 }
}

I tried the following, but my output isn't the same, and it should be:

for(k = begin; k != end; ++k) {
 for(j = 0; j < Emax; ++j) {
  for(i = 0; i+4 < N; i+=4) { 
   if (j >= E[i]) continue; 
   array[k] += foo(i, tr[k][i], ex[j][i]);
   array[k] += foo(i+1, tr[k][i+1], ex[j][i+1]);
   array[k] += foo(i+2, tr[k][i+2], ex[j][i+2]);
   array[k] += foo(i+3, tr[k][i+3], ex[j][i+3]);
  }
  if (i < N) {
   for (; i < N; ++i) {
    if (j >= E[i]) continue; 
    array[k] += foo(i, tr[k][i], ex[j][i]);
   }
  }
 }
}

I will be running this code in parallel using Intel's TBB so that it takes advantage of multiple cores. After this is finished running, another function prints out what is in array[] and right now, with my unrolling, the output isn't identical. Any help is appreciated.

Update: I fixed it. I used the answer for this question to do the unrolling... the output wasn't matching because I wasn't doing array[k] = 0; after the first for loop.

Thanks, Hristo

+2  A: 
   if (j >= E[i]) continue; 
   array[k] += foo(i, tr[k][i], ex[j][i]);
   array[k] += foo(i+1, tr[k][i+1], ex[j][i+1]);
   array[k] += foo(i+2, tr[k][i+2], ex[j][i+2]);
   array[k] += foo(i+3, tr[k][i+3], ex[j][i+3]);

versus

if (j >= E[i]) continue; 
array[k] += foo(i, tr[k][i], ex[j][i]);

Screening conditions are not identical

a better approach to screening (eliminate branching):

array[k] += (j < E[i])*foo(i, tr[k][i], ex[j][i]);

also, you need to guarantee N is divisible by 4 otherwise you may overshoot. alternatively, truncate N to be divisible by four (N - N%4)

aaa
that is why i added the if (i < N) after the 3rd for loop to account for what I have missed if N is not divisible by 4.
Hristo
@hri okay, I did not see that. than most likely it's only screening condition.
aaa
you make a good point. I added a "if ( j >= E[i+1] ) " for each of the 4 foo calls (I did +2 and +3 as well) but that didn't seem to fix it :/
Hristo
How does this work? array[k] += (j < E[i])*foo(i, tr[k][i], ex[j][i]);I don't understand how the multiplication works?
Hristo
@hri Boolean expression is either zero or one. so you either add zero or expression value. note that this does not reduce work, only skips undesired values. if you look to reduce work, depending what foo is, screening may harm performance. branch instructions are very expensive.
aaa
A: 

I think that the if (j >= E[i]) continue; is your problem. In the original, this test is run for every index i. In your unrolled version, it is only tested for every fourth index. Try the following:

for (i = 0; i < N; /*advanced in loop*/) {
    if (j >= E[i]) continue;
    array[k] += foo(i, tr[k][i], ex[j][i]); ++i;
    if (j >= E[i]) continue;
    array[k] += foo(i, tr[k][i], ex[j][i]); ++i;
    if (j >= E[i]) continue;
    array[k] += foo(i, tr[k][i], ex[j][i]); ++i;
    if (j >= E[i]) continue;
    array[k] += foo(i, tr[k][i], ex[j][i]); ++i;
}
while (i < N) {
    if (j >= E[i]) {
        ++i; // missing in original version
        continue;
    }
    array[k] += foo(i, tr[k][i], ex[j][i]);
    ++i;
}

Edit: I forgot to increment an index in the original version that was causing an infinite loop when j >= E[i].

D.Shawley
hmm... This makes sense. However, its still not working. I'm not sure why. With the while loop at the end, it seems to get stuck in an infinite loop. But if I use my for loop like above, the output is incorrect.
Hristo
what if `if (j >= E[i]) continue` is always true? you get infinite loop
aaa
The continue skips too far. Try if (j < E[i]) { array[k]+= ... } ++i;
Accipitridae
@Accipitridae... that didn't work :(
Hristo
@Hristo aaa and D.Shawley have clearly identified a bug in your code. Since D.Shawley's proposal has also a small bug, it is quite difficult to estimate what you were trying. It would be helpful if you fixed your code and if you would update your answer with the fixed code, so that it is possible to look for remaining errors.
Accipitridae
Thanks. I updated the question with the bug I had that was producing weird outputs.
Hristo