views:

161

answers:

3

We have this set of data that we need to get the average of a column. a select avg(x) from y does the trick. However we need a more accurate figure.

I figured that there must be a way of filtering records that has either too high or too low values(spikes) so that we can exclude them in calculating the average.

+1  A: 

in sql server there's also the STDEV function so maybe that can be of some help...

Mladen Prajdic
+3  A: 

There are three types of average, and what you are originally using is the mean - the sum of all the values divided by the number of values.

You might find it more useful to get the mode - the most frequently occuring value:

select name,    
       (select top 1 h.run_duration
     from sysjobhistory h
     where h.step_id = 0
     and h.job_id = j.job_id
     group by h.run_duration
     order by count(*) desc) run_duration
from sysjobs j

If you did want to get rid of any values outside the original standard deviation, you could find the average and the standard deviation in a subquery, eliminate those values which are outside the range : average +- standard deviation, then do a further average of the remaining values, but you start running the risk of having meaningless values:

select oh.job_id, avg(oh.run_duration) from sysjobhistory oh
inner join (select job_id, avg(h.run_duration) avgduration, 
            stdev(h.run_duration) stdev_duration 
            from sysjobhistory h 
            group by job_id) as m on m.job_id = oh.job_id
where oh.step_id = 0
and abs(oh.run_duration - m.avgduration) <  m.stdev_duration
group by oh.job_id
Jonathan
the first sql is not an option because the columns value is randomthe second one works great!
ken
A: 

Your question is hard to answer .. how to better calculate an average than with avg?

Please add some more detail:

  • What do you mean by more accurate?
  • What do you mean by "too high" or "too low" values?
IronGoofy
We actually wanted to retrieve the average of delta time of a given set. 99% of the set have consistent delta. We want to get rid the 1% as they are considered noise.
ken
That's outlier elimination, a common statistical request. Not hard to understand.
bortzmeyer
Thx bortzmeyer for the info. I'm having a hard time googling because I didn't know the corrent term.
ken