How threads are organized to be executed by a GPU?
views:
1409answers:
3
+2
Q:
Understanding CUDA grid dimensions, block dimensions and threads organization (simple explanation)
Help me make clear the answer, thinking it is for beginners.I did need such an answer some months ago, when starting with cuda,and I guess this could be a great help to them.
cibercitizen1
2010-03-06 11:23:26
A:
suppose a 9800GT GPU: 14 multiprocessors, each has 8 threadprocessors and warpsize is 32 which means each threadprocessor handles up to 32 threads. 14*8*32=3584 is the maximum number of actuall cuncurrent threads.
if you execute this kernel with more than 3584 threads (say 4000 threads and it's not important how you define the block and grid. gpu will treat them like the same):
func1();
__syncthreads();
func2();
__syncthreads();
then the order of execution of those two functions are as follows:
1.func1 is executed for the first 3584 threads
2.func2 is executed for the first 3584 threads
3.func1 is executed for the remaining threads
4.func2 is executed for the remaining threads
Bizz
2010-06-14 06:25:52