ansaurus

Question

Answer 1

+1 A:

One of the main drivers for the number of registers is amount of local data you declare in your kernel. However, the PTX assembler can do quite a good job of re-using registers, so it's not always easy to work out how many will be used from the PTX code - you need to run ptxas to get the real answer.

Edric 2010-10-07 06:37:04

Answer 2

+1 A:

Compiling with nvcc -Xptxas -v will print out the diagnostic information Edric mentioned. Additionally, you can force the compiler to conserve registers using the __launch_bounds__ qualifier. For example

__global__ void
__launch_bounds__(maxThreadsPerBlock, minBlocksPerMultiprocessor)
MyKernel(...)
{ 
   ...
}

guarantees that at least minBlocksPerMultiprocessor blocks of size maxThreadsPerBlock will fit on a single SM. See Section B.16 of the CUDA Programming Guide for a complete explanation of __launch_bounds__.

wnbell 2010-10-09 04:29:24

ansaurus

tags:

views:

answers:

cuda SM register limit

related questions