Hi,
i'm writing a C# class to perform 2D separable convolution using integers to obtain better performance than double counterpart. The problem is that i don't obtain a real performance gain.
This is the X filter code (it is valid both for int and double cases):
foreach (pixel)
{
int value = 0;
for (int k = 0; k < filterOffsetsX.Length; k++)
{
value += InputImage[index + filterOffsetsX[k]] * filterValuesX[k]; //index is relative to current pixel position
}
tempImage[index] = value;
}
In the integer case "value", "InputImage" and "tempImage" are of "int", "Image<byte>
" and "Image<int>
" types.
In the double case "value", "InputImage" and "tempImage" are of "double", "Image<double>
" and "Image<double>
" types.
(filterValues is int[] in each case)
(The class Image<T>
is part of an extern dll. It should be similar to .NET Drawing Image class..).
My goal is to achieve fast perfomance thanks to int += (byte * int) vs double += (double * int)
The following times are mean of 200 repetitions.
Filter size 9 = 0.031 (double) 0.027 (int)
Filter size 13 = 0.042 (double) 0.038 (int)
Filter size 25 = 0.078 (double) 0.070 (int)
The performance gain is minimal. Can this be caused by pipeline stall and suboptimal code?
EDIT: simplified the code deleting unimportant vars.
EDIT2: i don't think i have a cache miss related problema because "index"iterate through adjacent memory cells (row after row fashion). Moreover "filterOffstetsX" contains only small offsets relatives to pixels on the same row and at a max distance of filter size / 2. The problem can be present in the second separable filter (Y-filter) but times are not so different.