tags:

views:

144

answers:

1

I have a kernel which passes 3 arrays, the first array d_A1 has no data and is used only to write back data, the other two arrays d_D1 and d_ST1 have data.

The size of the first array is:

d_A1[13000000]

The size of the second array is:

d_D1[421]

The size of the third array is:

d_ST1[21]

N is 13000000

TestArray<<>>(d_A1,N, d_D1, d_ST1);

Now I want only pass the data of d_D1[421] and d_ST1[21] to shared arrays so I created the shared arrays as:

__global__ void TestArray(int* A1, unsigned int N,  int* D1, unsigned int* ST1)
{

   unsigned int __align__(16) tid = threadIdx.x;
   unsigned int __align__(16) idx = __umul24(blockDim.x, blockIdx.x) + threadIdx.x;  
   __shared__ unsigned int __align__(16) s_D1[441];  //Shared array for d_D1
   __shared__ unsigned int __align__(16) s_ST1[21];  //Shared array for d_ST1

   if (idx < N)   //13000000

   {

Q. How do I pass the data of d_D1[441] and d_ST1[21] to s_D1[441] and s_ST1[21]? I tried:

      while (idx < 441)

        s_D1[tid] = d_D1[idx] 

      __syncthreads(); 


      while (idx < 21)

        s_ST1[tid] = d_ST1[idx] 


      __syncthreads();  

but the computer freezes and I have to restart it. I also tried one at the time,namely, only the fist while and then only the second while, with no luck.

If I use the global memory, namely, d_D1, d_ST1 everything works. So the question is: How do you pass data to a shared variable/array when the size of the array is not N?

   }   //End of kernel processing



}
+1  A: 

Great, you're right, it has to be if (tid < 21) d_ST1[tid] = ST1[tid] since one block is enough for those two size. Thanks you very much indeed. However, when do you use if (idx < n) d_st1[tid] = st1[idx]?

Mark
One case where you would use if (idx < n) d_st1[tid] = st1[idx] is when the length of st1 is n, n < 512, and blocks, threads are <<<M/n, M>>>. The more likely pattern is for n to be large, so you'll want to split the data among blocks. Check out Nvidia's matrix multiply for a good example.
jeff7
Should it be <<<n/M, M>> like n/number of threads per blocks?
Mark
Yes, <<n/M, M>>.
jeff7
So, If I understood correctly I will use "if(idx < n) d_st1[tid] = st1[idx]" only if st1[n] and n could be any number > 512 (max threads in a block)
Mark