say I have a cuda kernel
__global__ foo (int a, int b)
{
... ...
}
where a and b are stored. Does this takes register space for each thread?
say I have a cuda kernel
__global__ foo (int a, int b)
{
... ...
}
where a and b are stored. Does this takes register space for each thread?
No this doesn't take register space for each thread, instead a
and b
are allocated once in a constant space - a read only space - from which all thread can read.
Note that this space is cached by constant registers and shared by all threads:
A read-only constant cache that is shared by all scalar processor cores and speeds up reads from the constant memory space, which is a read-only region of device memory [PTX ISA Version 2.1 Chapter 3].