views:

163

answers:

4

Hi,

our webapp collects huge amount of data about user actions, network business, database load, etc etc etc

All data is stored in warehouses and we have quite a lot of interesting views on this data.

if something odd happens chances are, it shows up somewhere in the data.

However, to manually detect if something out of the ordinary is going on, one has to continually look through this data, and look for oddities.

My question: what is the best way to detect changes in dynamic data which can be seen as 'out of the ordinary'.

Are bayesan filters (I've seen these mentioned when reading about spam detection) the way to go?

Any pointers would be great!

EDIT: To clarify the data for example shows a daily curve of database load. This curve typically looks similar to the curve from yesterday In time this curve might change slowly.

It would be nice that if the curve from day to day changes say within some perimeters, a warning could go off.

R

+2  A: 

This depends so much on what the data is. Take a statistics class and learn the basics first. This isn't usually an easy or simple problem.

Wahnfrieden
great answer. Really helpful ;^)
Toad
Or even a well-posed question. Just what exactly do you mean by anomalous anyways?
Carlos Rendon
+1  A: 

Bayesian classification might help you find some anomalies in your data, depending on the type of data and how good you train your Bayesian filter.

There is even one available as a web service @ uClassify.com.

Alix Axel
+2  A: 

Take a look at Control Charts, they provide a way to track changes in your data visually and specify when the data is "out of control" or "anomalous". They are heavily used in manufacturing to ensure quality control.

Carlos Rendon
+2  A: 

This question is impossible to answer without knowing much more about the particular data you have. For an overview of what kinds of approaches exist, see Anomaly Detection: A Survey by Chandola, Banerjee, and Kumar.

Jouni K. Seppänen