views:

24

answers:

1

On page 51 of the Compute Visual Profiler User Guide it states that:

" Note that in case the number blocks in a kernel is less than or not a multiple of the number of multiprocessors the counters values across multiple runs will not be consistent. "

Is that an inclusive or exclusive "or" statement? Does it always have to be a multiple?

+1  A: 

The inconsistency mentioned in the docs is causes by load imbalance between multiprocessors.

For instance, if you are running a kernel with 15 blocks on a Tesla C2050 which has 14 multiprocessors, one of the multiprocessors will end up running threads from the one "extra" block. If the profiler happens to be collecting data from this multiprocessor running threads of two blocks in one profiling run, but from one running only threads from a single block in another one, the results will obviously deviate.

To answer the very question you asked, the "or" is inclusive, as is usually in natural languages.

Although I do not remember being mentioned in the documentation, I can image that even if these conditions are both false, profiling inconsistency can also occur when the data itself causes imbalance (amount of arithmetics/data or memory addressing patters conditional on some data).

pszilard