tags:

views:

101

answers:

1

Hi

I need to do an atomic FP add operation on global memory on a CC 2.0 device. If the global data referenced in a warp fit into an aligned 128-byte sector, will these operations be done in parallel or will they be executed one at a time?

My guess would be that they are parallel, but I am not sure of this

Regards Gautham Ganapathy

A: 

Atomic operations are slower than normal operations, because they really can't happen in parallel.

What will probably happen is that each add will be done one at a time, but execution won't progress past the add until all the threads have completed it, it will look parallel from the code's perspective.

I'm not sure if the access will be coalesced or not, but the speed penalty from the atomic operations will probably outweigh the memory access speed benefit.

interfect
True. However, what I have been wondering is that since the G200 device memory controller is intelligent enough to resolve conflicts and uncoalesced read/write accesses from a half-warp, assuming tat the memory controller had sufficient independent atomic op execution units for processing, perhaps all the operations across a halfwarp could be done in parallel without interrupts from other device memoryu requests. Is this possible?
Gautham Ganapathy
For example, if each warp performs an update such as atomicAdd(baseAddress + tid, x), all operations for a half-warp could be done in parallel by the memory controller if it had 16 adders instead of 1. Question is, is this the case?
Gautham Ganapathy