views:

152

answers:

4

Hi,

I've worked on many data matching problems and very often they boil down to quickly and in parallel running many implementations of CPU intensive algorithms such as Hamming / Edit distance. Is this the kind of thing that CUDA would be useful for?

What kinds of data processing problems have you solved with it? Is there really an uplift over the standard quad-core intel desktop?

Chris

+1  A: 

For one, in SIGGRAPH '09 they showed a CUDA implementation of Vray for Maya. Real-time ray-tracing and preview quality at 20-fps with a $200 card? I think it helps greatly.

Xavier Ho
+1  A: 

yes, it is main domain of CUDA. It's efficiency is maximum if following conditions are true:

  1. Processing of element does not depend on results of processing of other.
  2. No branching. Or at least adjacent elements branch the same way.
  3. Elements are spread uniformly in memory.

Of course there are really few tasks that fall into this conditions. Depending on how far you move from them the efficiency will get lower. Sometimes you need to completely rewrite your algorithm to maximize usage.

Andrey
+4  A: 

I think you've answered your own question. In general, CUDA/OpenCL accelerates massively parallel operations. We've used CUDA to perform various DSP operations (FFT, FIR) and seen order-of-magnitude speedups. Order of magnitude speedups with a couple hundred dollars is a steal. While specialized CPU libraries like MKL and OpenMP have given us quite a speed increase, CUDA/OpenCL is much faster.

Check here for examples of a CUDA usage

basszero
+1  A: 

CUDA has been used to vastly improve speeds in computer tomography, the FASTRA project for instance performs on par with supercomputers (not just quad-core desktops!) while being assembled out of consumer-grade hardware for a few thousand euros.

Other research topics I'm aware of are swarm optimization and real-time audio processing.

In general: the technique can be used in every domain where all data must be processed the same way since all cores will perform the same operation. If your problem boils down to this kind of operations you're good to go :). Too bad not everything falls into this category...

Pieter