ansaurus

Question

CUDA counting, reduction and thread warps

Answer 1

+1 A:

In your reduction you're doing:

cache[cidx] += cache[cidx];

Don't you want to be poking at the other half of the block's local values?

nsanders 2010-10-14 23:41:16

yes I do, nice catch thank you.

Andrew Redd 2010-10-15 03:19:57

Answer 2

+1 A:

You can count the nonzero-values with a single line of code using Thrust. Here's a code snippet that counts the number of 1s in a device_vector.

#include <thrust/count.h>
#include <thrust/device_vector.h>
...
// put three 1s in a device_vector
thrust::device_vector<int> vec(5,0);
vec[1] = 1;
vec[3] = 1;
vec[4] = 1;

// count the 1s
int result = thrust::count(vec.begin(), vec.end(), 1);
// result == 3

If your data does not live inside a device_vector you can still use thrust::count by wrapping the raw pointers.

wnbell 2010-10-15 01:32:16

This is a very nice solution as well. But I would rather figure out what is wrong with the code that I have, than learn a new library. After I get the basic of what I'm doing down I'll look at Thrust to make my coding faster.

Andrew Redd 2010-10-15 13:56:01

ansaurus

tags:

views:

answers:

CUDA counting, reduction and thread warps

related questions