views:

18

answers:

0

I'm trying to wrap my head around this task and wondering if there is a standard way of doing this or some libraries that would be useful.

Certain events are tracked and timed at several data sources S1 ... SN. The recorded information is the event type and timestamp. There may be several events of the same type sequentially or they may be intermittent. There could be "missing" events - i.e. when one of the sources misses it, and, vice versa, when a source introduces a "false positive". There is typically a time difference between observations of the same event at different sources. This time difference has a constant component due to physical location of the sources but may also have a varying component introduced by network latency and other factors.

I need to find an algorithm that would find the optimal maximum time interval that should be used to group the observations at all sources in a single "observed event" and allow detection of the missing events and false positives.

I am wondering if the solution is really somewhere in the statistics field rather than algoritghms. Any input would be much appreciated.