So I finally took the time to learn CUDA and get it installed and configured on my computer and I have to say, I'm quite impressed!
Here's how it does rendering the Mandelbrot set at 1280 x 678 pixels on my home PC with a Q6600 and a GeForce 8800GTS (max of 1000 iterations):
Maxing out all 4 CPU cores with OpenMP: 2.23 fps
Running the same algorithm on my GPU: 104.7 fps
And here's how fast I got it to render the whole set at 8192 x 8192 with a max of 1000 iterations:
Serial implemetation on my home PC: 81.2 seconds
All 4 CPU cores on my home PC (OpenMP): 24.5 seconds
32 processors on my school's super computer (MPI with master-worker): 1.92 seconds
My home GPU (CUDA): 0.310 seconds
4 GPUs on my school's super computer (CUDA with static output decomposition): 0.0547 seconds
So here's my question - if we can get such huge speedups by programming the GPU instead of the CPU, why is nobody doing it??? I can think of so many things we could speed up like this, and yet I don't know of many commercial apps that are actually doing it.
Also, what kinds of other speedups have you seen by offloading your computations to the GPU?