tags:

views:

53

answers:

2

Is there a standard approach for deduping parallel event streams ? Before I attempt to reinvent the wheel, I want to know if this problem has some known approaches.

My client component will be communicating with two servers. Each one is providing a near real-time event stream (~1 second). The events may occasionally be out of order. Assume I can uniquely identify the events. I need to send a single stream of events to the consuming code at the same near real-time performance.

A: 

A lot has been written about this kind of problem. Here's a foundational paper, by Leslie Lamport:

http://research.microsoft.com/en-us/um/people/lamport/pubs/pubs.html#time-clocks

The Wikipedia article on Operational Transformation theory is a perfectly good starting point for further research:

http://en.wikipedia.org/wiki/Operational%5Ftransformation

As for your problem, you'll have to choose some arbitrary weight to measure the cost of delay vs the cost of dropped events. You can maintain two priority queues, time-ordered, where incoming events go. You'd do a merge-and on the heads of the two queues with some delay (to allow for out-of-order events), and throw away events that happened "before" the timestamp of whatever event you last sent. If that's no better than what you had in mind already, well, at least you get to read that cool Lamport paper!

Jonathan Feinberg
The paper was interesting and highlights the complexity of the problem. I'm looking to see if there are some other options given some domain specific aspects of my problem that might be a bit easier to implement. Thank you.
Jim Rush
Oh. Well, the last paragraph of my answer is meant to be that easy-to-implement solution!
Jonathan Feinberg
+1  A: 

I think that the optimization might be OS-specific. From the task as you described it I think about two threads consuming incoming data and appending it to the common stream having access based on mutexes. Both Linux and Win32 have mutex-like procedures, but they may have slow performance if you have data rate is really great. In this case I'd operate by blocks of data, that will allow to use mutexes not so often. Sure there's a main thread that consumes the data and it also access it with a mutex.

Maksee