Hi! I'm trying to parallelize a convolution function in C. Here's the original function which convolves two arrays of 64-bit floats:
void convolve(const Float64 *in1,
UInt32 in1Len,
const Float64 *in2,
UInt32 in2Len,
Float64 *results)
{
UInt32 i, j;
for (i = 0; i < in1Len; i++) {
for (j = 0; j < in2Len; j++) {
results[i+j] += in1[i] * in2[j];
}
}
}
In order to allow for concurrency (without semaphores), I created a function that computes the result for a particular position in the results
array:
void convolveHelper(const Float64 *in1,
UInt32 in1Len,
const Float64 *in2,
UInt32 in2Len,
Float64 *result,
UInt32 outPosition)
{
UInt32 i, j;
for (i = 0; i < in1Len; i++) {
if (i > outPosition)
break;
j = outPosition - i;
if (j >= in2Len)
continue;
*result += in1[i] * in2[j];
}
}
The problem is, using convolveHelper
slows down the code about 3.5 times (when running on a single thread).
Any ideas on how I can speed-up convolveHelper
, while maintaining thread safety?