tags:

views:

474

answers:

4

CUDA vs Direct X 10 for parallel mathematics. any thoughs you have about it ?

A: 

Well, CUDA is portable... That's a big win if you ask me...

dicroce
CUDA is portable from Windows to Linux -- or so I understand -- but not from NVidia GPUs to ATI.
Die in Sente
A: 

I find CUDA awkward. It's not C, but a subset of it. It doesn't support double precision floating point natively and is emulated. For single precision it's okay though. It depends on the type of task you throw at it. You have to spend more time computing in parallel than you spend passing the data around for it to be worth using. But that issue is not unique to CUDA.

I'd wait for Apple's OpenCL which seems like it will be the industry standard for parallel computing.

Are you sure about the double precision?
Die in Sente
+3  A: 

CUDA is probably a better option, if you know your target architecture is using nVidia chips. You have complete control over your data transfers, instruction paths and order of operations. You can also get by with a lot less __syncthreads calls when you're working on the lower level.

DirectX 10 will be easier to interface against, I should think, but if you really want to push your speed optimization, you have to bypass the extra layer. DirectX 10 will also not know when to use texture memory versus constant memory versus shared memory as well as you will depending on your particular algorithm.

If you have access to a Tesla C1060 or something like that, CUDA is by far the better choice hands down. You can really speed things up if you know the specifics of your GPGPU - I've seen 188x speedups in one particular algorithm on a Tesla versus my desktop.

Mike
A: 

CUDA has nothing to do about supporting double precision floating point operations. This is dependent on the hardware available. The 9, 100, 200 and Tesla series support double precision floating point operations tesla.

Edison Gustavo Muenz