views:

1123

answers:

4

I'm interested to know if any common algorithms (sorting, searching, graphs, etc.) have been ported to OpenCL (or any GPU language), and how the performance compares to the same algorithm executed by the CPU. I'm specifically interested in the results (numbers).

Thanks!

+6  A: 

GPUs are highly specialized hardware designed to do a small set of tasks very well and highly parallelized. This is basically arithmetic (particularly single precision floating point math although newer GPUs do quite well with double precision). As such they're only suited to particular algorithms. I'm not sure if sorting fits that category (in the general case at least).

More common examples are pricing of financial instruments, large amounts of matrix maths and even defeating encryption (by brute force). That being said, I did find Fast parallel GPU-sorting using a hybrid algorithm.

Another commonly quoted example is running SETI@HOME on an Nvidia GPU but it's comparing apples to oranges. The units of work for GPUs are different (and highly limited) compared to what CPUs ordinarily do.

cletus
+2  A: 

Have a look at thrust:

Thrust is a CUDA library of parallel algorithms with an interface resembling the C++ Standard Template Library (STL). Thrust provides a flexible high-level interface for GPU programming that greatly enhances developer productivity.

Dirk Eddelbuettel
Thrust has also just released version 1.1.
Eric
+1  A: 

There are quite a few samples of this sort of thing on NVidia's website. Bear in mind that some things such as sorting need special algorithms for efficient parallelism and may not be quite as efficient as a non-threaded algorithm on a single core.

ConcernedOfTunbridgeWells
A: 

BE WARY, VERY WARY of any performance numbers quoted for GPGPU. Lots of people like to post really impressive numbers that don't take into consideration the transfer time needed to get the input data from the CPU to the GPU and the output data back, both going over a PCIe bottleneck.

Die in Sente
Thanks-good point.
Christopher
This is true, but many of the examples on NVIDIA's webpage are complete applications and definitely do include these transfer times. The real concern is: how optimized is the CPU version in the benchmark?
Eric