Hi all,
I wrote some code recently (ISO/ANSI C), and was surprised at the poor performance it achieved. Long story short, it turned out that the culprit was the floor() function. Not only it was slow, but it did not vectorize (with Intel compiler, aka ICL).
Here are some benchmarks for performing floor for all cells in a 2D matrix:
VC: 0.10
ICL: 0.20
Compare that to a simple cast:
VC: 0.04
ICL: 0.04
How can floor() be that much slower than a simple cast?! It does essentially the same thing (apart for negative numbers). 2nd question: Does someone know of a super-fast floor() implementation?
Thanks in advance,
Roger
PS: Here is the loop that I was benchmarking:
void Floor(float *matA, int *intA, const int height, const int width, const int width_aligned)
{
float *rowA=NULL;
int *intRowA=NULL;
int row, col;
for(row=0 ; row<height ; ++row){
rowA = matA + row*width_aligned;
intRowA = intA + row*width_aligned;
#pragma ivdep
for(col=0 ; col<width; ++col){
/*intRowA[col] = floor(rowA[col]);*/
intRowA[col] = (int)(rowA[col]);
}
}
}