views:

1409

answers:

3

How threads are organized to be executed by a GPU?

+3  A: 
cibercitizen1
Help me make clear the answer, thinking it is for beginners.I did need such an answer some months ago, when starting with cuda,and I guess this could be a great help to them.
cibercitizen1
A: 

The CUDA Programming Guide should be a good place to start for this. I would also recommend checking out the CUDA introduction slides from here.

Tom
A: 

suppose a 9800GT GPU: 14 multiprocessors, each has 8 threadprocessors and warpsize is 32 which means each threadprocessor handles up to 32 threads. 14*8*32=3584 is the maximum number of actuall cuncurrent threads.

if you execute this kernel with more than 3584 threads (say 4000 threads and it's not important how you define the block and grid. gpu will treat them like the same):

func1();
__syncthreads();
func2();
__syncthreads();

then the order of execution of those two functions are as follows:

1.func1 is executed for the first 3584 threads

2.func2 is executed for the first 3584 threads

3.func1 is executed for the remaining threads

4.func2 is executed for the remaining threads

Bizz