Hi
I need to do an atomic FP add operation on global memory on a CC 2.0 device. If the global data referenced in a warp fit into an aligned 128-byte sector, will these operations be done in parallel or will they be executed one at a time?
My guess would be that they are parallel, but I am not sure of this
Regards Gautham Ganapathy