views:

65

answers:

3

Some of the concepts and designs of the "SIMT" architecture are still unclear to me.

From what I've seen and read, diverging code paths and if() altogether are a rather bad idea, because many threads might execute in lockstep. Now what does that exactly mean? What about something like:

kernel void foo(..., int flag)
{
    if (flag)
        DO_STUFF
    else
        DO_SOMETHING_ELSE
}

The parameter "flag" is the same for all work units and the same branch is taken for all work units. Now, is a GPU going to execute all of the code, serializing everything nonetheless and basically still taking the branch that is not taken? Or is it a bit more clever and will only execute the branch taken, as long as all threads agree on the branch taken? Which would always be the case here.

I.e. does serialization ALWAYS happen or only if needed? Sorry for the stupid question. ;)

+1  A: 

No, doesn´t happen always. Executing both branches happens only if the condition is not coherent between threads in a local work group, that means if the condition evaluates to different values between work items in a local work group, current generation GPUs will execute both branches, but only the correct branches will write values and have side effects.

So, maintaining coherency is vital to performance in GPU branches.

Matias Valdenegro
I wouldn't exactly call it vital. It is an important factor in mainting performance. While it depends on how complex the branches are, but at least for most situations I encountered so far, it was way more important to optimize memory access (coalescing, avoiding bank conflicts, reducing traffic in general) than to make sure the branches evaluate to the same path (which in some cases would have worked against those optimizations).
Grizzly
@Grizzly It is vital if you use branches :)
Matias Valdenegro
+1  A: 

not sure about ati, but for nvidia - it is clever. There will be no serialization, if every thread in warp goes the same way.

+1  A: 

in your example, flag will have the same value for all work items, so a good compiler will generate code which will take all work-items in the same direction.

But consider the following case:

kernel void foo(..., int *buffer)
{
    if (buffer[get_global_id(0)])
        DO_STUFF
    else
        DO_SOMETHING_ELSE
}

Here it is not guaranteed that all work-items will take the same path, so serialization or control-flow elimination is required.

zr
The decision whether or not the codepaths diverge should happen inside the hardware on a case by case basis, meaning that for your example serialization might or might not happen, depending on the values in buffer (it might only happen for some workgroups, but not for others to)
Grizzly