What I'm trying to do is take this C code and optimize it using a technique called loop unrolling, but in this case I want to use four-way loop unrolling. Now, I understand the technique and I understand the concept I just don't know how to apply it to this code. Do I have to add in some extra variables? Do I have to have some code after each loop or just at the end of all the loops? This code is 8x8 block code dealing with taking pixels and rotating it 90 degrees counter clock wise. Any help would greatly be appreciated. Thank You.
/*
* rotate8 - rotate with 8x8 blocking
*/
char rotate8_descr[] = "rotate8: rotate with 8x8 blocking";
void rotate8(int dim, pixel *src, pixel *dst)
{
int i, j, ii, jj;
for(ii = 0; ii < dim; ii += 8)
for(jj = 0; jj < dim; jj += 8)
for (i = ii; i < ii + 8; i++)
for (j = jj; j < jj + 8; j++)
dst[RIDX(dim-1-j, i, dim)] = src[RIDX(i, j, dim)];
}