Compiling CUDA code with immediate (integer) operands, are they held in the instruction stream, or are they placed into memory? Specifically I'm thinking about 24 or 32 bit unsigned integer operands.
I haven't been able to find information about this in any of the CUDA documentation I've examined so far. So references to any documents on specific uarch details like this would be perfect, as I don't currently have a good model for how CUDA works at this level.