Some of the concepts and designs of the "SIMT" architecture are still unclear to me.
From what I've seen and read, diverging code paths and if() altogether are a rather bad idea, because many threads might execute in lockstep. Now what does that exactly mean? What about something like:
kernel void foo(..., int flag)
{
    if (flag)
        DO_STUFF
    else
        DO_SOMETHING_ELSE
}
The parameter "flag" is the same for all work units and the same branch is taken for all work units. Now, is a GPU going to execute all of the code, serializing everything nonetheless and basically still taking the branch that is not taken? Or is it a bit more clever and will only execute the branch taken, as long as all threads agree on the branch taken? Which would always be the case here.
I.e. does serialization ALWAYS happen or only if needed? Sorry for the stupid question. ;)