How can I calculate the median and standard deviation of a bunch stream of numbers in Perl?

views:

907

answers:

+6 Q:

How can I calculate the median and standard deviation of a bunch stream of numbers in Perl?

Hi everyone,

In our logfiles we store response times for the requests. What's the most efficient way to calculate the median response time, the "75/90/95% of requests were served in less than N time" numbers etc? (I guess a variation of my question is: What's the best way to calculate the median and standard deviation of a bunch stream of numbers).

The best I came up with was just reading all the numbers, ordering them and then picking out the numbers, but that seems really goofy. Isn't there a smarter way?

We use Perl, but solutions for any language might be helpful.

+5 A:

you can have look at quick select:

http://en.wikipedia.org/wiki/Selection%5Falgorithm

Or at the Wirth algorithm: http://www.mail-archive.com/[email protected]/msg20059.html

Benchmark for the median can be found here: http://ndevilla.free.fr/median/median/index.html

LeMiz 2009-09-29 07:54:35

+8 A:

See the article Calculating Percentiles in Memory-bound Applications. It explains how to calculate median and other percentiles efficiently.

Also, here's an article on calculating standard deviation (variance) as you go: Accurately computing running variance.

John D. Cook 2009-09-29 08:00:10

+3 A:

Have a look at PDL... the Perl Data Language.

Also see these previous SO questions about mean/std dev:

/I3az/

draegtun 2009-09-29 12:50:21

Thanks draegtun!

Ask Bjørn Hansen 2009-09-30 15:40:05

+1 A:

There are code examples here: http://rosettacode.org/wiki/Standard%5FDeviation

glenn jackman 2009-09-30 13:47:46

ansaurus

tags:

views:

answers:

How can I calculate the median and standard deviation of a bunch stream of numbers in Perl?

related questions