I'm debating about whether to learn GP-GPU stuff, such as CUDA, or whether to put it off. My problem domain (bioinformatics) is such that it might be nice to know, since a lot of our problems do have massive parallelism, but most people in the field certainly don't know it. My question is, how difficult the API for CUDA and other GP-GPU technologies to use in practice? Is it extremely painful, or is most of the complexity well-encapsulated? Does it feel like "normal" programming, or is the abstraction of complexity of running your code on the graphics card leaky to non-existent?
In CUDA, you write in C, but you should exactly know what you are doing to achieve maximum performance. The concepts are not and should not be abstracted away since the GPU works so differently. The same is true for CPU SIMD instructions such as SSE. You, from a higher level perspective should know what you want to do with parallel facilities and utilize it the best way you can. You should break the problem to utilize parallelism efficiently. You should know how the GPU does the processing (SIMD style) and try to minimize jumps.
While you're using C syntax, it's not really abstracted away.
That said, it's far better than writing shaders in HLSL!
It's not really horrible, but remember it's still going to be heavily optimized for floating point matrix multiply. For bioinformativs you might also want to look at the Cell supercomputer stuff, or Map/reduce methods.
If you want to start experimenting with GPGPU, I'd recommend looking at the newer OpenCL programming model.
For the past few years, there's been a lot of turmoil in different programming APIs for GPGPU. CUDA runs only on NVidia gpus. AMD's stream SDK is roughly similar to CUDA in terms of features, but hasn't been around as long, hasn't captured the same mind share as CUDA, and runs only on AMD(ATI) gpus.
Microsoft's DirectX 11 compute shader should work on any brand of gpu, but of course it will only run on Windows Vista or Windows 7, not Linux and not Windows XP. It's currently available in "technology preview" as part of the DirectX SDK.
OpenCL is new, still being implemented, but it should eventually work accross all operating systems and brands of GPU hardware, so the buzz is that this will be the evolutionary survivor as CUDA and the other proprietary libraries die off.
(Predictions of the future not guaranteed :-)
I've just started looking at this myself. My initial impression is that it is not too hard to learn, certainly no worse then learning MPI, perhaps a little simpler. To get the best results out of it will still require some understanding of the memory model, but I think you can considerable benefit just doing the obvious things. My entirely subjective impression is that this technology is about to reach critical mass in bioinformatics, and pretty quickly everybody will be doing it. Take a look at the CUDA tutorials on the NVIDA site. Have you looked at this BMS Bioinformatics paper: CUDA compatible GPU cards as efficient hardware accelerators for Smith-Waterman sequence alignment