ansaurus

Question

Answer 1

A:

Histogramming is not particularly efficient when implemented with CUDA (or with GPGPU in general) - typically you need to generate lots of partial histograms in shared memory and then sum them. You might want to consider keeping this particular task on the CPU.

Paul R 2010-06-05 09:00:29

However, My task is to try using CUDA to apply a histogram. And I can't finish it. The data can not achieve singlely

kitw 2010-06-05 13:50:01

Answer 2

A:

You will have to either use atomic function to block other thread from using he same memory, or use the partial histogram. Either way it not that efficient unless the input image is very very large.

sjchoi 2010-06-05 19:51:03

Answer 3

+1 A:

Have you looked at the SDK sample? The "histogram" sample is available in the CUDA SDK (currently version 3.0 on the NVIDIA developer site, version 3.1 beta available for registered developers).

The documentation with the sample explains nicely how to handle your summation, either using global memory atomics on the GPU or by collecting the results for each block separately and then doing a separate reduction (either on the host or the GPU).

Tom 2010-06-08 10:29:56

ansaurus

tags:

views:

answers:

how to make a CUDA Histogram kernel?

related questions