While reading a post on StackOverflow (http://stackoverflow.com/questions/1502081/im-trying-to-optimize-this-c-code-using-4-way-loop-unrolling), which is now marked as closed, I came across an answer (comment, in fact) that said the following: "The two inner loops could possibly get a speed boost by using UInt64 and bit shifting"
Here is the code that was int he post:
char rotate8_descr[] = "rotate8: rotate with 8x8 blocking";
void rotate8(int dim, pixel *src, pixel *dst)
int i, j, ii, jj;
for(ii = 0; ii < dim; ii += 8)
for(jj = 0; jj < dim; jj += 8)
for (i = ii; i < ii + 8; i++)
for (j = jj; j < jj + 8; j++)
dst[RIDX(dim-1-j, i, dim)] = src[RIDX(i, j, dim)];
Could anyone please explain how would that be applied here? I am interested in knowing how to apply bitshifting on this code, or a similar code, and why that would help in performance. Also, how would this code be optimized for cache usage? Any suggestions?
Assume this code was Double Tiled/Blocked (big tile=32, and inside it tiles of 16), and also Loop Invariant Code Motion was applied.. would it still benefit from bitshifting and UInt64?
If not, then what other suggestions would work?