ansaurus

Question

Calculate average without being thrown by strays

Answer 1

+2 A:

You might get some use out of standard deviation here, which basically measures how concentrated the data points are. You can define an outlier as anything more than 1 standard deviation (or whatever other number suits you) from the average, throw them out, and calculate a new average that doesn't include them.

grossvogel 2010-10-19 19:23:25

This only really works if you have a normal distribution. With a flat distribution it may well do something bad to the result.

Rafe 2010-10-19 22:31:31

Answer 2

A:

You could put the values into an array, sort the array, and then find the median, which is usually a better number than the average anyway because it discounts outliers automatically, giving them no more weight than any other number.

Robusto 2010-10-19 19:26:17

Answer 3

A:

Here's a pretty naive implementation that you could fix up for your own needs. I purposely kept it pretty verbose. It's based on the five-number-summary often used to figure these things out.

function get_median($arr) {
    sort($arr);
    $c = count($arr) - 1;
    if ($c%2) {
        $b = round($c/2);
        $a = $b-1;
        return ($arr[$b] + $arr[$a]) / 2 ;
    } else {
        return $arr[($c/2)];
    }
}

function get_five_number_summary($arr) {
    sort($arr);
    $c = count($arr) - 1;
    $fns = array();
    if ($c%2) {
        $b = round($c/2);
        $a = $b-1;
        $lower_quartile = array_slice($arr, 1, $a-1);
        $upper_quartile = array_slice($arr, $b+1, count($lower_quartile));
        $fns = array($arr[0], get_median($lower_quartile), get_median($arr), get_median($upper_quartile), $arr[$c-1]);
        return $fns;
    }
    else {
        $b = round($c/2);
        $a = $b-1;
        $lower_quartile = array_slice($arr, 1, $a);
        $upper_quartile = array_slice($arr, $b+1, count($lower_quartile));
        $fns = array($arr[0], get_median($lower_quartile), get_median($arr), get_median($upper_quartile), $arr[$c-1]);
        return $fns;
    }
}

function find_outliers($arr) {
    $fns = get_five_number_summary($arr);
    $interquartile_range = $fns[3] - $fns[1];
    $low = $fns[1] - $interquartile_range;
    $high = $fns[3] + $interquartile_range;
    foreach ($arr as $v) {
        if ($v > $high || $v < $low)
            echo "$v is an outlier<br>";
    }
}

//$numbers = array( 19,20,21,21,22,30,60 ); // 60 is an outlier
$numbers = array( 1,230,239,331,340,800); // 1 is an outlier, 800 is an outlier
find_outliers($numbers);

Not that this method, albeit much simpler to implement than standard deviation, will not find the two 60 outliers in your example, but it works pretty well. Use the code for whatever, hopefully it's useful!

To see how the algorithm works and how I implemented it, go to: http://www.mathwords.com/o/outlier.htm

This, of course, doesn't calculate the final average, but it's kind of trivial after you run find_outliers() :P

David Titarenco 2010-10-19 20:36:57

Answer 4

A:

You might sort your numbers, choose your preferred subrange (e.g., the middle 90%), and take the mean of that.

There is no one true answer to your question, because there are always going to be distributions that will give you a funny answer (e.g., consider a biased bi-modal distribution). This is why may statistics are often presented using box-and-whisker diagrams showing mean, median, quartiles, and outliers.

Rafe 2010-10-19 22:35:44

Answer 5

A:

Why don't you use the median? It's not 30, it's 21.5.

Mike C 2010-10-20 00:09:04

ansaurus

tags:

views:

answers:

Calculate average without being thrown by strays

related questions