Here some C++ code that is accessed from multiple threads in parallel. It has a critical section:
lock.Acquire();
current_id = shared_id;
// small amounts of other code
shared_id = (shared_id + 1) % max_id;
lock.Release();
// do something with current_id
The class of the lock variable is wrapper around the POSIX mutex implementation. Because of the module operations, it is not possible to use atomic operations.
Is it possible that a gcc compiler with a O3 flag optimizes the code so that the assignment of current_id is moved before the lock?