views:

239

answers:

3

Hey All

First of all:

  • I am well aware that OpenCL does not magically make everything faster
  • I am well aware that OpenCL has limitations

So now to my question, i am used to do different scientific calculations using programming. Some of the things i work with is pretty intense in regards to the complexity and number of calculations. SO i was wondering, maybe i could speed things up bu using OpenCL.

So, what i would love to hear from you all is answers to some of the following [bonus for links]:

*What kind of calculations/algorithms/general problems is suitable for OpenCL

*What is the general guidelines for determining if some particular code would benefit by migration to OpenCL?

Regards

+2  A: 

It's well suited to tasks that can be expressed as a somewhat small program working in parallel over large chunks of simple data structures.

If want to compute the difference between two images, OpenCL is for you. If you want to ray-trace a scene, it's somewhat difficult but still feasible. If you have answer large amounts of web-service requests, OpenCL is not the solution.

Malte Clasen
+6  A: 

I think this is a good question, and it's something I'm trying to work out for my own research as well.

There are, at the moment, strong limitations in terms of what GPUs can do, as they require individual threads to execute exactly the same code on different sets of data, i.e. the problem / algorithm must be "data parallel". Obviously data parallel problems include Monte Carlo simulations (where many MC simulations are executed in parallel), image processing, and less obviously molecular dynamics simulations. Numerical integration (Monte Carlo or otherwise) is another scientific application which can be easily ported to running on a GPU.

The other main restriction is that memory per thread is very limited, and so to be efficiently executed on a GPU the algorithm must have high arithmetic intensity. A necessary but not sufficient condition for an algorithm to be a candidate for running on a GPU is that on the CPU the algorithm must be strongly CPU bound rather than memory bound.

My view is that as time goes on, more and more problems will be shoehorned so that they can be solved using this paradigm just because there is such a large performance gain to be made, but the low hanging fruit are the obviously data parallel problems. Massively multicore programming is, in my view, going to be increasingly important and prevalent in scientific circles over the next decade.

I've played around with this a bit, and managed to shoehorn a backtracking problem into an appropriate format for executing on a GPU (using CUDA). FYI, I describe this in a talk: http://lattice.complex.unimelb.edu.au/home/sites/default/files/mydocuments/clisby_cuda0509.pdf

Nathan
+1  A: 

As far as algorithms are concerned they have to be data parallel. That is one set data should not have dependencies on previous sets. To draw an analogy, consider insertion sorting where one element is compared with other elements to find its right place. This is "not" data parallel as each step needs to access the N-1 elements. Now if you need to sort using openCL you would have to implement bitonic sorting which is a type of sorting network.

Even if they are data parallel, there is the question of tradeoff between FLOPS and memory latencies. if each data has to be fetched from global memory, then the performance improvement might not be significant. GPU's memory latencies are far higher than CPU's. To counter this, there are local memories in GPU that can be utilized.