ansaurus

Question

Intelligent Database - Capable of identifying out of the ordinary values

Answer 1

A:

I don't have MySQL installed at the moment but I guess the first can be achieved with a query similar to this (off the top of my head, not tested, could not work at all):

SELECT name, salary FROM emp WHERE salary>(SELECT AVG(salary) FROM emp);

Or, a more complex query would be:

SELECT name, salary from emp WHERE salary - (SELECT AVG(salary) FROM emp) >
        (SELECT AVG(salary - (SELECT AVG(salary) FROM emp)) FROM emp);

The 2nd one basically selects the employees whose salaries differ from the average of the salaries by more than the average of the difference in all the employees' salaries.

Lemme know if it works.

Leo Jweda 2010-01-17 12:45:45

Also, creating a view could be helpful if the query is recurring.

Agos 2010-01-17 13:23:21

Answer 2

A:

The hard part is defining "out of the ordinary."

What you're trying to do is what fraud detection software for figuring out when somebody is laundering money is all about. Your simple example is an easy one. The more complex ones are done with databases, statistics, data mining, and rules engines that contain lots of rules. It's not an easy problem, unless you want to restrict yourself to the trivial case that you cited.

If you manage to turn it into an easy problem, you'll be a wealthy person. Good luck.

duffymo 2010-01-17 13:20:10

Answer 3

A:

You could use Analysis Services and a data mining model.

Obviously you'd have to adapt the code, but here's a sample from Microsoft:

http://www.sqlserverdatamining.com/ssdm/Default.aspx?tabid=101&Id=83

"This sample shows how the clustering algorithm can be used to perform automatic data validation through the use of the PredictCaseLikelihood() function. To exercise the sample, enter values into the form and click the submit button. If the combination of values has a reasonable likelihood, the form will accept the values. If not, additional elements of the prediction query indicate the value likely to be unacceptable. Checking the “Show Details” box on the form will show the query that was sent in addition to the probability ratios used to determine the outlying values."

Shane Cusson 2010-01-18 05:22:07

Answer 4

A:

There are different methods for finding outliers: distance-based, cluster-based, etc.

You could use Data Applied's outlier detection or clustering analytics. The first one automatically finds records which are most different from their N closest neighbors. The second finds large groups (clusters) of records, and identifies records which don't fit well any cluster. They make it free for small data sets, and it's online (http://www.data-applied.com). You don't have to write code, but you can use their Web API if you want.

Mark Senizer 2010-01-20 18:21:24

ansaurus

tags:

views:

answers:

Intelligent Database - Capable of identifying out of the ordinary values

related questions