I have a cuda program like this :
for (int i=0;i<100000;i++) {
if (i%2 == 0) {
bind_x(x) // bind x to texture
kernel_code<<A,B>>(M,x,y) // calculate y = M*x
}
else {
bind_x(y)
kernel_code<<A,B>>(M,y,x) // calculate x = M*y
}
cudaThreadSynchronize();
if (i%2 == 0)
unbind_x(x)
else
unbind_x(y) // unbind x from texture
}
I heard that if I do not put cudaThreadSynchronize();
cpu will continue to run without waiting for the kernel to end so ... Should I call cudaThreadSynchronize()
before unbind_x(). I try to run with& without, the result is the same ?!? (And in theory It shouldn't)