views:

768

answers:

4

Im just learning OpenCL and im at the point when trying to launch a kernel. Why is it that the GPU threads are managed in a grid? I'm going to read more about this in detail but it would be nice with a simple explanation. Is it allways like this when working with GPGPU's?

A: 

The simple answer is that GPUs are designed to process images and textures that are 2D grids of pixels. When you render a triangle in DirectX or OpenGL, the hardware rasterizes it into a grid of pixels.

Die in Sente
+3  A: 

This is a common approach, which is used in CUDA, OpenCL and I think ATI stream.

The idea behind the grid is to provide a simple, but flexible, mapping between the data being processed and the threads doing the data processing. In the simple version of the GPGPU execution model, one GPU thread is "allocated" for each output element in a 1D, 2D or 3D grid of data. To process this output element, the thread will read one (or more) elements from the corresponding location or adjacent locations in the input data grid(s). By organizing the threads in a grid, it's easier for the threads to figure out which input data elements to read and where to store the output data elements.

This contrasts with the common multi-core, CPU threading model where one thread is allocated per CPU core and each thread processes many input and output elements (e.g. 1/4 of the data in a quad-core system).

Eric
As whatnick said, it's more than just making it easy for threads - the hardware requires this organization to execute single instructions for multiple data (SIMD).
RD1
The hardware requires no such thing. The hardware is *more efficient* if special subgroups of threads within the grid perform the same actions at the same time. It's more complex that I described here, but also different than what you and whatnick state.
Eric
+1  A: 

I will invoke the classic analogy of putting a square peg in a round hole. Well in this case the GPU is a very square hole and not as well rounded as GP(general purpose) would suggest. The above explanations put forward the ideas of 2d textures etc. The architecture of the GPU is such that all processing is done in streams with the pipeline being identical in each stream, so the data being processed need to be segmented like that.

whatnick
A: 

one reason why this is a nice api is that typically you are working with an algorithm that has several nested loops. if you have one, two or three loops then a grid of one, two or three dimensions maps nicely to the problem, giving you a thread for the value of each index.

so values that you need in your kernel (index values) are naturally expressed in the api.

andrew cooke