views:

380

answers:

4

Hello, I'm looking for some source code implementing 3d convolution. Ideally, I need C++ code or CUDA code. I'd appreciate if anybody can point me to a nice and fast implementation :-)

Cheers

+3  A: 

you understand that convolution is normally done by using an fft? see, for example, http://en.wikipedia.org/wiki/Convolution

so you need an fft library.

http://stackoverflow.com/questions/1548809/fastest-method-to-compute-convolution suggests http://www.fftw.org/ (for a traditional cpu).

for cuda, use cufft - http://www.gsic.titech.ac.jp/~ccwww/tebiki/tesla%5Fe/tesla6%5Fe.html

andrew cooke
For small kernels it can sometimes be faster to use matrix convolution, in cases where there is hardware to support it (eg, a GPU for 4x4 or 8x8 kernels). For big kernels, Fourier is da man for sure.
Crashworks
FWIW, the original source for cufft docs is here: http://www.nvidia.com/object/cuda_develop.html
Steve Fallows
A: 

Are you a registered developer? If so you should download the 3.0 SDK and check out the FDTD3d sample which shows a 3d convolution as applied for an explicit finite differences app. In the 2.3 SDK there was a sample called 3dfd which was similar (and has now been replaced).

It may be more efficient to use this approach rather than FFT if your impulse response is short.

Tom
You can register at http://www.nvidia.com/object/cuda_get.html, click "Apply Now". Alternatively, you can just look at the 3dfd sample in the current SDK, the concepts remain the same.
Tom
A: 

Hello, Actually, I'm planning to use a kernel with a small support (3x3 and 7x7 later on probably) so convolving might be faster than using the FFT. Anyway, I can use the FFTW library or the CUDA library. Does any of you know what's the speed up gained with the CUDA code ?

I'm not a registered developer so I don't have access to the 3.0 SDK. Can you point me to the web page to register ?

Thanks

This should probably have been two comments on the answers, rather than a whole new answer! Apart from anything else then the original responders would have been alerted to your follow-up question.
Tom
crashworks suggested using a gpu directly for small kernels. i have no experience with that, but i think you'd need to use opengl or similar rather than opencl (because the latter exposes a generic c-like interface) and you're probably restricted to floats (also true of opencl on some devices).
andrew cooke
this look slike a good article on explicit convolution using opencl - http://developer.amd.com/gpu/ATIStreamSDK/ImageConvolutionOpenCL/Pages/ImageConvolutionUsingOpenCL.aspx (it probably has timing measurements somewhere).
andrew cooke
i was wrong above. at least, cuda supports convolution with texture memory, so i suspect opencl will too.
andrew cooke
A: 

Intel has a very good example - using SSE + OpenMP and a serial version of it. The code is primarily meant to profile the serial and a parallel approach, but is done in a nice way. http://software.intel.com/en-us/articles/16bit-3d-convolution-sse4openmp-implementation-on-penryn-cpu/

Sayan Ghosh