views:

554

answers:

2

Note that I can conduct the research inside the boost source code, and may do this to answer my own curiosity if there isn't anyone out there with an answer.

I do ask however because maybe someone has already done this comparison and can answer authoritatively?

It would seem that creating a shared memory mapped file between processes, and through construction with InterlockedIncrement() one could create a largely usermode mutex akin to a CRITICAL_SECTION, which would be considerably more performant than the Win32 Mutex for interprocess synchronisation.

So my expectation is that it may be probably for the implementation on Win32 of boost::interprocess_mutex to have been implemented in this manner, and for it to be substantially quicker than the native API offering.

I only however have a supposition, I don't know through field testing what the performance of the boost::interprocess_mutex is for interprocess synchronisation, or deeply investigated its implementation.

Does anyone have experience in using it or profiling its relative performance, or can they comment on using the safety of using InterlockedIncrement() across processes using shared memory?

+1  A: 

It would seem that creating a shared memory mapped file between processes, and through construction with InterlockedIncrement() one could create a largely usermode mutex akin to a CRITICAL_SECTION, which would be considerably more performant than the Win32 Mutex for interprocess synchronisation.

CRITICAL_SECTION internally can use a synchronization primitive when there's contention. I forget if it's an event, semaphore, or mutex.

You can "safely" use Interlocked functions on memory, so there's no reason why you couldn't use it for cross-process synchronization, other than that would be really crazy and you should probably either use threads or a real synchronization primitive.

But officially, you can.

MSN
This is what I believe too, I just can't find much in the way of knowledge on using interlocked increment for interprocess synchronisation on Windows. boost hints that it does it. It may be a slightly obscure question as shared memory is not a technique that often used on the Win32 platform due to the prevalence and development expertise with the threading framework.
polyglot
http://msdn.microsoft.com/en-us/library/ms684122(VS.85).aspxWay down in the last sentence of the first paragraph under The Interlocked API it says you can use them for shared memory across processes.
MSN
"The threads of different processes can use these functions if the variable is in shared memory." Well spotted, thank you. :) I wonder if there are some people that have used this facility in their implementations?
polyglot
One problem with such implementations is that you need to care for terminated processes manually. I.e., one process locks a shared mutex, terminated due to an error -- mutex stays locked. As the operating system now does not manage this problem anymore, the mutex will stay locked forever -- so one needs to store the owner etc. too. I really safe implementation is much work. Boost.Interprocess does not provide such features on Win32.
gimpf
No kidding. That's why you might as well use a win32 mutex instead of rolling your own. Any "high-performance" cross process communication will not likely be high performance anyway.
MSN
+3  A: 

In boost 1.39.0, there is only specific support for pthreads. On all other platforms, it becomes a busy-loop with a yield call in the middle (essentially the same system that you describe). See boost/interprocess/sync/emulation/interprocess_mutex.hpp. For example, here's the implementation of lock():

inline void interprocess_mutex::lock(void)
{
   do{
      boost::uint32_t prev_s = detail::atomic_cas32(const_cast<boost::uint32_t*>(&m_s), 1, 0);

      if (m_s == 1 && prev_s == 0){
            break;
      }
      // relinquish current timeslice
      detail::thread_yield();
   }while (true);
}

What this means is that a contended boost::interprocess::mutex on windows is VERY expensive - although the uncontended case is almost free. This could potentially be improved by adding an event object or similar to sleep on, but this would not fit well with boost::interprocess's API, as there would be nowhere to put the per-process HANDLE needed to access the mutex.

bdonlan
Thanks for your comments. One almost needs "process local storage" (as opposed to thread-local-storage), but it's not a use-case that should almost ever comes up.
polyglot
Process local storage is malloc() or new :)
bdonlan