views:

98

answers:

1

Is there any good reference to Algorithms that people use for rare event detection ? Also, How is the time factor taken into account ? If i have a case where successive data points tell something (t_1 to t_n) , How can one factor this into normal Machine learning scenario ?

Any pointer will be appreciated.

+4  A: 

It may help to describe your scenario more. Since you are trying to find rare events I assume that you have a working definition of not rare (For some problem spaces this is really hard).

For instance lets say that we have some process that is not a random walk process such as CPU utilization for some service. If you wanted to detect rare events you could take the mean utilization and then look several standard deviations out. Techniques from Statistical Process Control are useful here.

If we have a random walk process such as stock prices (can of worms opened...please just assume this for the sake of simplicity). The directional movement from t to t+1 is random. A random event might be a certain number of consecutive moves in a single direction or a large move in a single direction at a single time step. See Stochastic Calculus for the underlying concepts.

If a process at step t is dependent only on step t-1 then we can use Markov Chains to model the process.

This is a short list of mathematical techniques available to you. Now on to machine learning. Why do you want to use machine learning? (Always good to think about to make sure you are not over complicating the problem) Lets assume that you do and it is the right solution. The actual algorithm that you use is not very important at this stage. What you need to do is define what a rare event is. Conversely you can define what a normal event is and look for things that are not normal. Note that these are not the same thing. Say we produce a set of rare events r1...rn. Each of those rare events will have some features associated with it. For instance if a computer failed there might be features like the last time it was seen on a network, its switch port status, etc... This is actually the most important part of machine learning, training set construction. It usually consists of hand labeling a set of examples to train the model on. Once you have a better understanding of the feature space you may be able to train another model to label for you. Repeat this process until you are satisfied.

Now if you are able to define your rare event set it may be cheaper to simply generate heuristics. For detecting rare events I have always found this to work better.

Steve
Agreed :) . the problem i am trying to figure out is are there any signal that i can catch prior to occurrence of these events. So time factor plays a role here.As there were rule based approaches that were defined before. But that does not learn, when there is change in software/HW, which it is investigating.
AlgoMan
The only thing to do is continually retrain the model. Machine learning works by looking at the past so it assumes that that the future will resemble the past. So you might be able to do something to determine if a process is not normal. However you will probably not be able to classify it into a specific category since you won't have seen it before.Consider High Frequency Trading. They have models that are built to work in most market conditions. When market conditions are not normal they shut down since they are unsure if the model will work.
Steve