views:

59

answers:

2

I'm trying to create a cuda program that counts the number of true values (defined by non-zero values) in a long vector through a reduction algorithm. I'm getting funny results. I get either 0 or (ceil(N/threadsPerBlock)*threadsPerBlock), neither is correct.

__global__ void count_reduce_logical(int *  l, int * cntl, int N){
    // suml is assumed to blockDim.x long and hold the partial counts
    __shared__ int cache[threadsPerBlock];
    int cidx = threadIdx.x;
    int tid = threadIdx.x + blockIdx.x*blockDim.x;

    int cnt_tmp=0;
    while(tid<N){
        if(l[tid]!=0)
                cnt_tmp++;
        tid+=blockDim.x*gridDim.x;
    }
    cache[cidx]=cnt_tmp;
    __syncthreads();
    //reduce
    int k =blockDim.x/2;
    while(k!=0){
        if(threadIdx.x<k)
            cache[cidx] += cache[cidx];
        __syncthreads();
        k/=2;
    }
    if(cidx==0)
        cntl[blockIdx.x] = cache[0];
}

The host code then collects the cntl results and finishes summation. This is going to be part of a larger project where the data is already on the GPU, so it makes sense to do the computations there, if they work correctly.

+1  A: 

In your reduction you're doing:

cache[cidx] += cache[cidx];

Don't you want to be poking at the other half of the block's local values?

nsanders
yes I do, nice catch thank you.
Andrew Redd
+1  A: 

You can count the nonzero-values with a single line of code using Thrust. Here's a code snippet that counts the number of 1s in a device_vector.

#include <thrust/count.h>
#include <thrust/device_vector.h>
...
// put three 1s in a device_vector
thrust::device_vector<int> vec(5,0);
vec[1] = 1;
vec[3] = 1;
vec[4] = 1;

// count the 1s
int result = thrust::count(vec.begin(), vec.end(), 1);
// result == 3

If your data does not live inside a device_vector you can still use thrust::count by wrapping the raw pointers.

wnbell
This is a very nice solution as well. But I would rather figure out what is wrong with the code that I have, than learn a new library. After I get the basic of what I'm doing down I'll look at Thrust to make my coding faster.
Andrew Redd