Hi.
I'm just starting out learning OpenCL. I'm trying to get a feel for what performance gains to expect when moving functions/algorithms to the GPU.
The most basic kernel given in most tutorials is a kernel that takes two arrays of numbers and sums the value at the corresponding indexes and adds them to a third array, like so:
__kernel void
add(__global float *a,
__global float *b,
__global float *answer)
{
int gid = get_global_id(0);
answer[gid] = a[gid] + b[gid];
}
__kernel void
sub(__global float* n,
__global float* answer)
{
int gid = get_global_id(0);
answer[gid] = n[gid] - 2;
}
__kernel void
ranksort(__global const float *a,
__global float *answer)
{
int gid = get_global_id(0);
int gSize = get_global_size(0);
int x = 0;
for(int i = 0; i < gSize; i++){
if(a[gid] > a[i]) x++;
}
answer[x] = a[gid];
}
I am assuming that you could never justify computing this on the GPU, the memory transfer would out weight the time it would take computing this on the CPU by magnitudes (I might be wrong about this, hence this question).
What I am wondering is what would be the most trivial example where you would expect significant speedup when using a OpenCL kernel instead of the CPU?