ansaurus

Question

Answer 1

+2 A:

Postgresql can also calculate the standard deviation.

You could take only the data points which are in the average() +/- 2*stddev() which would roughly correspond to the 90% datapoints closest to the average.

Of course 2 can also be 3 (95%) or 6 (99.995%) but do not get hung up on the numbers because in the presence of a collection outliers you are no longer dealing with a normal distribution.

Be very careful and validate that it works as expected.

Peter Tillemans 2010-05-29 08:59:07

This sounds good! I didn't know stddev would result in percentages of the set although it sounds perfectly legit. I know if I combine your answer with the one by Rodger, I must be on the right track!

milovanderlinden 2010-05-30 13:04:27

Answer 2

+1 A:

I cannot say; If a value is over X, it has to be eliminated.

Well, you could use having and a subselect to eliminate outliers, something like:

HAVING value < (
 SELECT 2 * avg(value)
 FROM   mytable
 GROUP BY ...
)

(Or, for that matter, use a more complex version to eliminate anything above 2 or 3 standard deviations if you want something that will be better at eliminating only outliers.)

The other option is to look at generating a median value, which is a fairly statistically sound way of accounting for outliers; happily there are three reasonable examples of just that: one from the Postgresql Wiki, one built as an Oracle compatability layer, and another from the PostgreSQL Journal. Note the caveats around how precisely/accurately they implement medians.

Rodger 2010-05-29 10:28:50

Excelent answer, especially the wiki page on aggregate median! I will however, as Peter Tillemans suggest, combine it with the stddev. But since your answer contains the most hints, I will rate it as the correct answer.

milovanderlinden 2010-05-30 13:05:09

Answer 3

A:

easy:

SELECT avg(value) from foo where value < 100;

Always read SQL queries starting with from clause, going to where clause and having this resultset in mind the aggregates get calculated

Janning 2010-05-29 15:57:49

ansaurus

tags:

views:

answers:

postgresql weighted average?

related questions