views:

638

answers:

5

I have read so many times, here and everywhere on the net, that mutexes are slower than critical section/semaphores/insert-your-preferred-synchronisation-method-here. but i have never seen any paper or study or whatever to back up this claim.

so, where does this idea come from ? is it a myth or a reality ? are mutexes really slower ?

+6  A: 

In the book "Multithreading application in win32" by Jim Beveridge and Robert Wiener it says: "It takes almost 100 times longer to lock an unowned mutex than it does to lock an unowned critical section because the critical section can be done in user mode without involving the kernel"

And on msdn here it says "critical section objects provide a slightly faster, more efficient mechanism for mutual-exclusion synchronization"

Mick
So are there any disadvantages of a critical section? I'm really a java programmer and in java the two terms get used synonymously.
Benj
In Java, it's all slow ;-)
Dave Markle
@Benj: I don't think you can use native Win32 Critical Section objects in Java, because once initialized, the Critical Section data structure must not be moved in memory. i.e. it wouldn't survive a compacting GC and it mustn't be passed around by value.
nikie
@Benj -- since critical section is not a kernel object, it works only within single process. If you need to synchronize access across several processes, then mutex is your choice.
atzz
thanks for the references.
Adrien Plisson
+2  A: 

Yes, critical sections are more efficient. For a very good explanation, get "Concurrent Programming on Windows".

In a nutshell: a mutex is a kernel object, so there is always a context switch when you acquire one, even if "free". A critical section can be acquired without a context switch in that case, and (on an multicore/processor machine) it will even spin a few cycles if it's blocked to prevent the expensive context switch.

nikie
+1  A: 

A mutex (at least in windows) allows for synchronizations between different processes in addition to threads. This means extra work must be done to ensure this. Also, as Brian pointed out, using a mutex also requires a switch to "kernel" mode, which causes another speed hit (I believe, i.e. infer, that the kernel is required for this interprocess synchronization, but I've got nothing to back me up on that).

Edit: You can find explicit reference to interprocess synchronization here and for more info on this topic, have a look at Interprocess Synchronization

Grant Peters
+7  A: 

I don't believe that any of the answers hit on the key point of why they are different.

Mutexes are at operating system level. A named mutex exists and is accessible from ANY process in the operating system (provided its ACL allows access from all).

Critical sections are faster as they don't require the system call into kernel mode, however they will only work WITHIN a process, you cannot lock more than one process using a critical section. So depending on what you are trying to achieve and what your software design looks like, you should choose the most appropriate tool for the job.

I'll additionally point out to you that Semaphores are separate to mutex/critical sections, because of their count. Semaphores can be used to control multiple concurrent access to a resource, where as a mutex/critical section is either being accessed or not being accessed.

Spence
+1 for pointing out the different nature of synchronisation achieved with those 2 constructs.
Adrien Plisson
+1  A: 

A CRITICAL_SECTION is implemented as a spinlock with a capped spin count. See MSDN InitializeCriticalSectionAndSpinCount for the indication of this.

When the spin count 'elapsed', the critical section locks a semaphore (or whatever kernel-lock it is implemented with).

So in code it works like this (not really working, should just be an example) :

CRITICAL_SECTION s;

void EnterCriticalSection( CRITICAL_SECTION* s )
{
    int spin_count = s.max_count;
    while( --spin_count >= 0 )
    {
        if( InterlockedExchange( &s->Locked, 1 ) == 1 )
        {
           // we own the lock now
           s->OwningThread = GetCurrentThread();
           return;
        }
    }
    // lock the mutex and wait for an unlock
    WaitForSingleObject( &s->KernelLock, INFINITE );
}

So if your critical section is only held a very short time, and the entering thread does only wait very few 'spins' (cycles) the critical section can be very efficient. But if this is not the case, the critical section wastes many cycles doing nothing, and then falls back to a kernel synchronization object.

So the tradeoff is :

Mutex : Slow acquire/release, but no wasted cycles for long 'locked regions'

CRITICAL_SECTION : Fast acquire/release for unowned 'regions', but wasted cycles for owned sections.

Christopher
Actually, a mutex wastes more cycles for the context switch than a EnterCriticalSections spends spinning. That's the point of spinning.
nikie
In a worst case scenario, every EnterCriticalSection, needs spin_count cycles + the cycles of the WaitForSingleObject.
Christopher
the MSDN says that the spin count is always zero on monoprocessor systems... does this mean that a critical section is the same as a mutex on those processors ?
Adrien Plisson
Not completely. You can easily check if another thread is currently trying to acquire the lock, because only one can run at any given time. So, if it is locked, we can directly go to the mutex to get our thread to sleep. If it is not locked, we get the same behavior as on a multicore system.
Christopher
you are right, i misread your pseudo code: the kernel object acquisition only occurs when the critical section is already locked.
Adrien Plisson