Briefly, a manual reset event is a synchronization construct which is either in the "signaled" or "nonsignaled" state. In the signaled state, any thread which calls a wait function on the event will not block and execution will continue unaffected. Any and all threads which calls a wait function on a nonsignaled object will block until the event enters the signaled state.
The the transition between the signaled and nonsignaled states occurs only as a result of explicit calls to functions such as SetEvent and ResetEvent.
I've built a synchronization mechanism on Windows which uses both these manual reset events and their auto-reset siblings. The auto-reset mechanism can be easily replicated with a semaphore, but I'm struggling to find an equivalent for the manual-reset variety.
In particular, while a condition variable with "notify all" functionality might appear similar at first glance, it has considerably different (perhaps non-functional) behavior when you consider the fact that it requires an associated mutex. First, before the thread can wait on a condvar, it must get the associated mutex. In addition to the cost of getting and releasing the mutex, this serializes unnecessarily all the threads which are about to wait. On wake, even though all threads are notified, only one thread will actually get the mutex at a time, incurring additional performance and concurrency penalties, since the mutex serves no purpose in this case.
The release case is especially poor on a multi-CPU system given that the simultaneous release of all waiters guarantees that the difference between a condvar and a Windows event will be observable - with an Event, at N threads will become runnable on an N CPU system, and can run in parallel, while with a condvar - even with an implementation that avoids the thundering herd - the threads can only leak out one at a time through the associated mutex.
Any pointers to a construct that better imitates the behavior of manual reset events would be greatly appreciated. The closest I can find is a barrier - this allows the unsynchronized approach and release of multiple threads to the barrier - but the barrier "breaks" based on waiting thread count rather than an explicit application call, which is what I need.