SIMD optimization puzzle
I Want to optimize the following function using SIMD (SSE2 & such): int64_t fun(int64_t N, int size, int* p) { int64_t sum = 0; for(int i=1; i<size; i++) sum += (N/i)*p[i]; return sum; } This seems like an eminently vectorizable task, except that the needed instructions just aren't there ... We can assume that N i...