ansaurus

Question

How to implement a critical section in CUDA?

Answer 1

+2 A:

Okay, I figured it out, and this is yet-another-one-of-the-cuda-paradigm-pains.

As any good cuda programmer knows (notice that I did not remember this which makes me a bad cuda programmer, I think) all threads in a warp must execute the same code. The code I wrote would work perfectly if not for this fact. As it is, however, there are likely to be two threads in the same warp accessing the same lock. If one of them acquires the lock, it just forgets about executing the loop, but it cannot continue past the loop until all other threads in its warp have completed the loop. Unfortunately the other thread will never complete because it is waiting for the first one to unlock.

Here is a kernel that will do the trick without error:

__global__ void k_testLocking(unsigned int* locks, int n) {
    int id = threadIdx.x % n;
    bool leaveLoop = false;
    while (!leaveLoop) {
        if (atomicExch(&(locks[id]), 1u) == 0u) {
            //critical section
            leaveLoop = true;
            atomicExch(&(locks[id]),0u);
        }
    } 
}

quadelirus 2010-01-07 15:06:16

This has been discussed several times on the NVIDIA forums. I think the conclusion is that this only works if you can ensure that the number of blocks is less than or equal to the number of multiprocessors. If not, it can lead to deadlock. In other words, try to find another way of implementing your algorithm that doesn't require critical sections.

Eric 2010-01-09 11:26:08

Answer 2

A:

by the way u have to remember that global memory writes and ! reads aren't completed where u write them in the code ... so for this to be practice you need to add a global memfence ie __threadfence()

eri 2010-01-20 15:34:14

ansaurus

tags:

views:

answers:

How to implement a critical section in CUDA?

related questions