views:

121

answers:

4

The documentation for the Win32 API PulseEvent() function (kernel32.dll) states that this function is “… unreliable and should not be used by new applications. Instead, use condition variables”. However, condition variables cannot be used across process boundaries like (named) events can.

I have a scenario that is cross-process, cross-runtime (native and managed code) in which a single producer occasionally has something interesting to make known to zero or more consumers. Right now, a well-known named event is used (and set to signaled state) by the producer using this PulseEvent function when it needs to make something known. Zero or more consumers wait on that event (WaitForSingleObject()) and perform an action in response. There is no need for two-way communication in my scenario, and the producer does not need to know if the event has any listeners, nor does it need to know if the event was successfully acted upon. On the other hand, I do not want any consumers to ever miss any events. In other words, the system needs to be perfectly reliable – but the producer does not need to know if that is the case or not. The scenario can be thought of as a “clock ticker” – i.e., the producer provides a semi-regular signal for zero or more consumers to count. And all consumers must have the correct count over any given period of time. No polling by consumers is allowed (performance reasons). The ticker is just a few milliseconds (20 or so, but not perfectly regular).

Raymen Chen (The Old New Thing) has a blog post pointing out the “fundamentally flawed” nature of the PulseEvent() function, but I do not see an alternative for my scenario from Chen or the posted comments.

Can anyone please suggest one?

Please keep in mind that the IPC signal must cross process boundries on the machine, not simply threads. And the solution needs to have high performance in that consumers must be able to act within 10ms of each event.

A: 

If I understand your question correctly, it seems like you can simply use SetEvent. It will release one thread. Just make sure it is an auto-reset event.

If you need to allow multiple threads, you could use a named semaphore with CreateSemaphore. Each call to ReleaseSemaphore increases the count. If the count is 3, for example, and 3 threads wait on it, they will all run.

Mark Wilkins
Don't seem to match OP request to signal all the waiting threads : "The state of an auto-reset event object remains signaled until a single waiting thread is released"
VirtualBlackFox
@VirtualBox: I wasn't completely sure if he wanted multiple threads, but I did add the named semaphore option, which would allow for that scenario.
Mark Wilkins
I want to release all waiting threads in all processes. Typically, there will be 1 thread in each of several processes. The producer does not know how many consumers there should be/are, and the number changes over time.
Bill
+2  A: 

There are two inherent problems with PulseEvent:

  • if it's used with auto-reset events, it releases one waiter only.
  • threads might never be awaken if they happen to be removed from the waiting queue due to APC at the moment of the PulseEvent.

An alternative is to broadcast a window message and have any listener have a top-level message -only window that listens to this particular message.

The main advantage of this approach is that you don't have to block your thread explicitly. The disadvantage of this approach is that your listeners have to be STA (can't have a message queue on an MTA thread).

The biggest problem with that approach would be that the processing of the event by the listener will be delayed with the amount of time it takes the queue to get to that message.

You can also make sure you use manual-reset events (so that all waiting threads are awaken) and do SetEvent/ResetEvent with some small delay (say 150ms) to give a bigger chance for threads temporarily woken by APC to pick up your event.

Of course, whether any of these alternative approaches will work for you depends on how often you need to fire your events and whether you need the listeners to process each event or just the last one they get.

Franci Penov
As I mentioned in the question, events come every 20ms or so. Many consumers are console Applications or services with no Window or message pump - window messages are not an option.
Bill
Yes, however, the consumers not only need to process each event but they need to do so within 10ms of its occurence. There may not be a good story for me here; if APC steals a waiting thread, it seems all I can do is 'record' that I missed an interval.
Bill
Argh, yer stuck with `PulseEvent`, matey. Or just using `SetEvent`/`ResetEvent` really fast, which is essentially the same. @codeka suggestion might help you with detecting missed events, but it won't improve the reliability much unless the listener don't "catch up" when they detect they missed an event.
Franci Penov
+2  A: 

The reason PulseEvent is "unreliable" is not so much because of anything wrong in the function itself, just that if your consumer doesn't happen to be waiting on the event at the exact moment that PulseEvent is called, it'll miss it.

In your scenario, I think the best solution is to manually keep the counter yourself. So the producer thread keeps a count of the current "clock tick" and when a consumer thread starts up, it reads the current value of that counter. Then, instead of using PulseEvent, increment the "clock ticks" counter and use SetEvent to wake all threads waiting on the tick. When the consumer thread wakes up, it checks it's "clock tick" value against the producer's "clock ticks" and it'll know how many ticks have elapsed. Just before it waits on the event again, it can check to see if another tick has occurred.

I'm not sure if I described the above very well, but hopefully that gives you an idea :)

Dean Harding
Thanks for the answer; I am trying to avoid "polling" behavior on behalf of consumers. I will study your comment a bit. Not sure if it differs much from using PulseEvent and then just checking if I "missied one" when a given consumer wakes up.
Bill
+2  A: 

I think you're going to need something a little more complex to hit your reliability target.

My understanding of your problem is that you have one producer and an unknown number of consumers all of which are different processes. Each consumer can NEVER miss any events.

I'd like more clarification as to what missing an event means.

i) if a consumer started to run and got to just before it waited on your notification method and an event occurred should it process it even though it wasn't quite ready at the point that the notification was sent? (i.e. when is a consumer considered to be active? when it starts or when it processes its first event)

ii) likewise, if the consumer is processing an event and the code that waits on the next notification hasn't yet begun its wait (I'm assuming a Wait -> Process -> Loop to Wait code structure) then should it know that another event occurred whilst it was looping around?

I'd assume that i) is a "not really" as it's a race between process start up and being "ready" and ii) is "yes"; that is notifications are, effectively, queued per consumer once the consumer is present and each consumer gets to consume all events that are produced whilst it's active and doesn't get to skip any.

So, what you're after is the ability to send a stream of notifications to a set of consumers where a consumer is guaranteed to act on all notifications in that stream from the point where it acts on the first to the point where it shuts down. i.e. if the producer produces the following stream of notifications

1 2 3 4 5 6 7 8 9 0

and consumer a) starts up and processes 3, it should also process 4-0

if consumer b) starts up and processes 5 but is shut down after 9 then it should have processed 5,6,7,8,9

if consumer c) was running when the notifications began it should have processed 1-0

etc.

Simply pulsing an event wont work. If a consumer is not actively waiting on the event when the event is pulsed then it will miss the event so we will fail if events are produced faster than we can loop around to wait on the event again.

Using a semaphore also wont work as if one consumer runs faster than another consumer to such an extent that it can loop around to the semaphore call before the other completes processing and if there's another notification within that time then one consumer could process an event more than once and one could miss one. That is you may well release 3 threads (if the producer knows there are 3 consumers) but you cant ensure that each consumer is released just the once.

A ring buffer of events (tick counts) in shared memory with each consumer knowing the value of the event it last processed and with consumers alerted via a pulsed event should work at the expense of some of the consumers being out of sync with the ticks sometimes; that is if they miss one they will catch up next time they get pulsed. As long as the ring buffer is big enough so that all consumers can process the events before the producer loops in the buffer you should be OK.

With the example above, if consumer d misses the pulse for event 4 because it wasn't waiting on its event at the time and it then settles into a wait it will be woken when event 5 is produced and since it's last processed counted is 3 it will process 4 and 5 and then loop back to the event...

If this isn't good enough then I'd suggest something like PGM via sockets to give you a reliable multicast; the advantage of this would be that you could move your consumers off onto different machines...

Len Holgate
Thanks. You have described my situation very well. Indeed, this is exactly what I have now - a counter in shared memory, a ring buffer with event data, and a consumer that "knows" if it missed an event occasionally because it was not waiting when the PulseEvent occurred. I was "fishing" here on StackOverflow to see if I can do better. We miss about 1 or 2 events per week (running 24/7 at 20ms per event and about 10 consumers), which is nearly perfect. A "missed event" cannot be acted upon - it is only valid for a brief moment.
Bill
PGM would likely give you too much latency. Given that missed events cant be acted on I'm tempted to say that you're operating as intended. If the consumer isn't ready for a new event when one occurs and it can't catch up then surely all it can do is skip that event, which is what is happening.
Len Holgate