I am working on CUDA and I have a problem related to thread synchronization. In my code I need threads to execute different parts of the code, like:
one thread ->
all thread ->
one thread ->
This is what I want. In the initial part of code only one thread will execute and then some part will be executed by all threads then again single thread. Also the threads are executing in a loop. Can anyone tell me how to do that? It's kinda urgent. I'll be grateful for any help.
Thanks