A CRITICAL_SECTION is implemented as a spinlock with a capped spin count. See MSDN InitializeCriticalSectionAndSpinCount for the indication of this.
When the spin count 'elapsed', the critical section locks a semaphore (or whatever kernel-lock it is implemented with).
So in code it works like this (not really working, should just be an example) :
CRITICAL_SECTION s;
void EnterCriticalSection( CRITICAL_SECTION* s )
{
int spin_count = s.max_count;
while( --spin_count >= 0 )
{
if( InterlockedExchange( &s->Locked, 1 ) == 1 )
{
// we own the lock now
s->OwningThread = GetCurrentThread();
return;
}
}
// lock the mutex and wait for an unlock
WaitForSingleObject( &s->KernelLock, INFINITE );
}
So if your critical section is only held a very short time, and the entering thread does only wait very few 'spins' (cycles) the critical section can be very efficient. But if this is not the case, the critical section wastes many cycles doing nothing, and then falls back to a kernel synchronization object.
So the tradeoff is :
Mutex :
Slow acquire/release, but no wasted cycles for long 'locked regions'
CRITICAL_SECTION : Fast acquire/release for unowned 'regions', but wasted cycles for owned sections.