views:

286

answers:

3

Given a time series of sensor state intervals, how do I implement a classifier which learns from supervised training data to detect an incident based on a sequence of state intervals? To simplify the problem, sensor states are reduced to either true or false.

Update: I've found this paper (PDF) on Mining Sequences of Temporal Intervals which addresses a similar problem. Another paper (Google Docs) on Mining Hierarchical Temporal Patterns in Multivariate Time Series takes a novel approach, but deals with hierarchical data.

Example Training Data

The following data is a training example for an incident, represented as a graph over time, where /¯¯¯\ represents a true state interval and \___/ a false state interval for a sensor.

 Sensor   |  Sensor State over time
          |  0....5....10...15...20...25...  // timestamp
 ---------|--------------------------------
 A        |  ¯¯¯¯¯¯¯¯¯¯¯¯\________/¯¯¯¯¯¯¯¯
 B        |  ¯¯¯¯¯\___________________/¯¯¯¯
 C        |  ______________________________  // no state change
 D        |  /¯\_/¯\_/¯\_/¯\_/¯\_/¯\_/¯\_/¯
 E        |  _________________/¯¯¯¯¯¯¯¯\___

Incident Detection vs Sequence Labeling vs Classification

I initially generalised my problem as a two-category sequence labeling problem, but my categories really represented "normal operation" and a rare "alarm event" so I have rephrased my question as incident detection. Training data is available for "normal operation" and "alarm incident".

To reduce problem complexity, I have discretized sensor events to boolean values, but this need not be the case.

Possible Algorithms

A hidden Markov model seems to be a possible solution, but would it be able to use the state intervals? If a sequence labeler is not the best approach for this problem, alternative suggestions would be appreciated.

Bayesian Probabilistic Approach

Sensor activity will vary significantly by time of day (busy in mornings, quiet at night). My initial approach would have been to measure normal sensor state over a few days and calculate state probability by time of day (hour). The combined probability of sensor states at an unlikely hour surpassing an "unlikelihood threshold" would indicate an incident. But this seemed like it would raise a false alarm if the sensors were noisy. I have not yet implemented this, but I believe that approach has merit.

Feature Extraction

Vector states could be represented as state interval changes occurring at a specific time and lasting a specific duration.

struct StateInterval
{
    int sensorID;
    bool state;
    DateTime timeStamp;
    TimeSpan duration; 
}

eg. Some State Intervals from the process table:

[ {D, true, 0, 3} ]; [ {D, false, 4, 1} ]; ...
[ {A, true, 0, 12} ]; [ {B, true, 0, 6} ]; [ {D, true, 0, 3} ]; etc.

A good classifier would take into account state-value intervals and recent state changes to determine if a combination of state changes closely matches training data for a category.

Edit: Some ideas after sleeping on how to extract features from multiple sensors' alarm data and how to compare it to previous data...

Start by calculating the following data for each sensor for each hour of the day:

  • Average state interval length (for true and false states)
  • Average time between state changes
  • Number of state changes over time

Each sensor could then be compared to every other sensor in a matrix with data like the following:

  • Average time taken for sensor B to change to a true state after sensor A did. If an average value is 60 seconds, then a 1-second wait would be more interesting than a 120-second wait.
  • Average number of state changes sensor B underwent while sensor A was in one state

Given two sets of training data, the classifier should be able to determine from these feature sets which is the most likely category for classification.

Is this a sensible approach and what would be a good algorithm to compare these features?


Edit: the direction of a state change (false->true vs true-false) is significant, so any features should take that into account.

+1  A: 

This doesn't sound like a classification problem. Classifiers aren't really meant to take into account "a combination of state changes." It sounds like a sequence labeling problem. Look into using a Hidden Markov Model or a Conditional Random Field. You can find an efficient implementation of the latter at http://leon.bottou.org/projects/sgd.

Edit: I've read through your question in a little more detail, and I don't think and HMM is the best model given what you want to do with features. It's going to blow up your state space and could make inference intractable. You need a more expressive model. You could look at Dynamic Bayesian Networks. They generalize HMMs by allowing the state space to be represented in factored form. Kevin Murphy's dissertation is the most thorough resource for them I've come across.

I'll still like CRFs though. Just as an easy place to start, define one with the time of day and each of the sensor readings as the features for each observation and use bigram feature functions. You can see how it performs and increase complexity of your features from there. I would start simple though. I think you're underestimating how difficult some of your ideas will be to implement.

Kevin
+1 for naming my problem more accurately: sequence labeling. I am looking into HMMs.
FreshCode
+2  A: 

A simple solution would be collapse the time aspect of your data and take each timestamp as one instance. In this case, the values of the sensors are considered your feature vector, where each time step is labeled with a class value of category A or B (at least for the labeled training data):

   sensors      | class
A  B  C  D  E   |
------------------------- 
1  1  1  0  0   |  catA
1  0  0  0  0   |  catB
1  1  0  1  0   |  catB
1  1  0  0  0   |  catA
..

This input data is fed to the usual classification algorithms (ANN, SVM, ...), and the goal is to predict the class of unlabeled time series:

   sensors      | class
A  B  C  D  E   |
------------------------- 
0  1  1  1  1   |   ?
1  1  0  0  0   |   ?
..

An intermediary step of dimensionality reduction / feature extraction could improve the results.

Obviously this may not be as good as modeling the time dynamics of the sequences, especially since techniques such as Hidden Markov Models (HMM) take into account the transitions between the various states.


EDIT

Based on your comment below, it seems that the best way to get less transitory predictions of the target class is to a apply a post-processing rule at the end of the prediction phase, and treating the classification output as a sequence of consecutive predictions.

The way this works is that you would compute the class posterior probabilities (ie: probability distribution that an instance belong to each class label, which in the case of binary SVM are easily derived from the decision function), then given a specified threshold, you check if the probability of the predicted class is above that threshold: if it is we use that class to predict the current timestamp, if not then we keep the previous prediction, and the same goes for future instances. This has the effect of adding a certain inertia to the current prediction.

Amro
Even though in my particular use-case ignoring the time-series and using a probabilistic state model could eventually be the most elegant solution, I specifically ask to classify a time-series. The sequence of state changes are significant and from this it should be inferred that sensor state changes (as opposed to simply state values) are not conditionally independent. Kevin has correctly identified my question as a sequence labeling problem.
FreshCode
I'm experimenting with this on an SVM. The problem I'm running into is that some sensor states are transitory while others are more persistent and if the transitory states had overlapped with more events, it would be classified differently. So, I'll have to "latch" local maxima and minima somehow and use those states instead.
FreshCode
@FreshCode: I added a note on how to get more "stable" predictions
Amro
+1  A: 

Why reinvent the wheel? Check out TClass

If that doesn't cut it for you, you can find there also a number of pointers. I hope this helps.

piccolbo
Hm, didn't spot this one. I'll have to use IKVM to run it from C#, though.
FreshCode
Is there a similar library in C#?
FreshCode
What is C#? Seriously, for us survivors of the COM era, that's off limits stuff (no kids, it's not .com it's COM).
piccolbo