ansaurus

Question

Need some help calculating percentile

Answer 1

A:

you will need to store a running sum and a total count.

then check out standard deviation calculations.

Randy 2010-06-21 00:21:39

-1 really? i do exactly this - and it works like a charm...

Randy 2010-06-30 18:08:53

This approach will only work if the data is normally (i.e. gaussian) distributed.

eglaser 2010-07-01 14:24:01

Answer 2

A:

Take one day's data. Use it to figure out what size to make your buckets (say one day's data shows that the vast majority (95%?) of your data is within 0.5 seconds of 1 second (ridiculous values, but hang in)

To get 65th percentile, you'll want at least 20 buckets in that range, but be generous, and make it 80. So you divide your 1 second window (-0.5 seconds to +0.5 seconds) into 80 buckets by making each 1/80th of a second wide.

Each bucket is 1/80th of 1 second. Make bucket 0 be (center - deviation) = (1 - 0.5) = 0.5 to itself + 1/80th of a second. Bucket 1 is 0.5+1/80th - 0.5 + 2/80ths. Etc.

For every value, find out which bucket it falls in, and increment a counter for that bucket.

To find 65th percentile, get the total count, and walk the buckets from zero until you get to 65% of that total.

Whenever you want to reset, set the counters all to zero.

If you always want to have good data available, keep two of these, and alternate resetting them, using the one you reset least recently as having more useful data.

Slartibartfast 2010-06-21 00:38:24

Answer 3

A:

Use an updown filter:

if q < x:
    q += .01 * (x - q)  # up a little
else:
    q += .005 * (x - q)  # down a little

Here a quantile estimator q tracks the x stream, moving a little towards each x. If both factors were .01, it would move up as often as down, tracking the 50 th percentile. With .01 up, .005 down, it floats up, 67 th percentile; in general, it tracks the up / (up + down) th percentile. Bigger up/down factors track faster but noisier -- you'll have to experiment on your real data.

(I have no idea how to analyze updowns, would appreciate a link.)

The updown() below works on long vectors X, Q in order to plot them: alt text

#!/usr/bin/env python
from __future__ import division
import sys
import numpy as np
import pylab as pl

def updown( X, Q, up=.01, down=.01 ):
    """ updown filter: running ~ up / (up + down) th percentile
        here vecs X in, Q out to plot
    """
    q = X[0]
    for j, x in np.ndenumerate(X):
        if q < x:
            q += up * (x - q)  # up a little
        else:
            q += down * (x - q)  # down a little
        Q[j] = q
    return q

#...............................................................................
if __name__ == "__main__":

    N = 1000
    up = .01
    down = .005
    plot = 0
    seed = 1
    exec "\n".join( sys.argv[1:] )  # python this.py N= up= down=
    np.random.seed(seed)
    np.set_printoptions( 2, threshold=100, suppress=True )  # .2f

    title = "updown random.exponential: N %d  up %.2g  down %.2g" % (N, up, down)
    print title
    X = np.random.exponential( size=N )
    Q = np.zeros(N)
    updown( X, Q, up=up, down=down )
        # M = np.zeros(N)
        # updown( X, M, up=up, down=up )
    print "last 10 Q:", Q[-10:]
    if plot:
        fig = pl.figure( figsize=(8,3) )
        pl.title(title)
        x = np.arange(N)
        pl.plot( x, X, "," )
        pl.plot( x, Q )
        pl.ylim( 0, 2 )
        png = "updown.png"
        print >>sys.stderr, "writing", png
        pl.savefig( png )
        pl.show()

Denis 2010-07-01 09:11:26

Looks to me like updown is tracking some biased variant of the mean, not the median.

Eamon Nerbonne 2010-09-23 15:24:14

Answer 4

A:

An easier way to get the value that represents a given percentile of a list or array is the scoreatpercentile function in the scipy.stats module.

>>>import scipy.stats as ss
>>>ss.scoreatpercentile(v,65)

there's a sibling percentileofscore to return the percentile given the value

Richard Careaga 2010-10-07 02:42:01

ansaurus

tags:

views:

answers:

Need some help calculating percentile

related questions