tags:

views:

23

answers:

2

I know number of block running on one SM is limited by block number, threads, shared memory, and register. Is there any strategy to avoiding having too many registers? I mean I just don't want to have too many of them, eventually it limits the number of block I run on one SM.

+1  A: 

One of the main drivers for the number of registers is amount of local data you declare in your kernel. However, the PTX assembler can do quite a good job of re-using registers, so it's not always easy to work out how many will be used from the PTX code - you need to run ptxas to get the real answer.

Edric
+1  A: 

Compiling with nvcc -Xptxas -v will print out the diagnostic information Edric mentioned. Additionally, you can force the compiler to conserve registers using the __launch_bounds__ qualifier. For example

__global__ void
__launch_bounds__(maxThreadsPerBlock, minBlocksPerMultiprocessor)
MyKernel(...)
{ 
   ...
}

guarantees that at least minBlocksPerMultiprocessor blocks of size maxThreadsPerBlock will fit on a single SM. See Section B.16 of the CUDA Programming Guide for a complete explanation of __launch_bounds__.

wnbell