I was stepping through some C/CUDA code in the debugger, something like:
for(uint i = threadIdx.x; i < 8379; i+=256) 
    sum += d_PartialHistograms[blockIdx.x + i * HISTOGRAM64_BIN_COUNT];
And I was utterly confused because the debugger was passing by it in one step, although the output was correct. I realised that when I put curly brackets around my loop as in the following snippet, it behaved in the debugger as expected.
for(uint i = threadIdx.x; i < 8379; i+=256) {
    sum += d_PartialHistograms[blockIdx.x + i * HISTOGRAM64_BIN_COUNT];
}
So is are parenthesis-free for loops treated differently in C or in the debugger, or perhaps it is particular to CUDA.
Thanks