views:

63

answers:

2

Hi, I'm having some trouble with a very basic CUDA program. I have a program that multiplies two vectors on the Host and on the Device and then compares them. This works without a problem. What's wrong is that I'm trying to test different number of threads and blocks for learning purposes. I have the following kernel:

__global__ void multiplyVectorsCUDA(float *a,float *b, float *c, int N){
    int idx = threadIdx.x;
    if (idx<N) 
        c[idx] = a[idx]*b[idx];
}

which I call like:

multiplyVectorsCUDA <<<nBlocks, nThreads>>> (vector_a_d,vector_b_d,vector_c_d,N);

For the moment I've fixed nBLocks to 1 so I only vary the vector size N and the number of threads nThreads. From what I understand, there will be a thread for each multiplication so N and nThreads should be equal.

The problem is the following

  1. I first call the kernel with N=16 and nThreads<16 which doesn't work. (This is ok)
  2. Then I call it with N=16 and nThreads=16 which works fine. (Again works as expected)
  3. But when I call it with N=16 and nThreads<16 it still works!

I don't understand why the last step doesn't fail like the first one. It only fails again if I restart my PC.

Has anyone run into something like this before or can explain this behavior?

+2  A: 

Wait, so are you calling all three in a row? I don't know the rest of your code, but are you sure you're clearing out the graphics memory you alloced between each run? If not, that could explain why it doesn't work the first time but does the third time when you're passing the same values, and why it only works again after rebooting (rebooting clears all the memory alloced).

fire.eagle
kirbuchi
Hmm. Honestly, I haven't touched CUDA in a few months. I do have some notes at home though with a few problems I had when I was working with it. I'll check against those later when I get back home. Sorry I couldn't be more help right now.
fire.eagle
Hey it's ok. Thanks for your answer. I'll keep trying and ask in the CUDA forums to see if I get lucky.
kirbuchi
+1  A: 

Don't know if its ok to answer my own question but I realized I had a bug in my code when comparing the host and device vectors (that part of the code wasn't posted). Sorry for the inconvenience. Could someone please close this post since it won't let me delete it?

kirbuchi
Just mark this response as the answer and then it can be closed.
Drew Marsh
ok, ty, now it can be closed.
kirbuchi