tags:

views:

111

answers:

4

in excel there is a frequency function:

The Excel FREQUENCY function This useful function can analyse a series of values and summarise them into a number of specified ranges. For example the heights of some children can be grouped in to four categories of [Less than 150cm]; [151 - 160cm]; [161 - 170cm]; [More than 170cm].

Would you like to learn more?

Excel 2003 Formulas by John Walkenbach (with CD)

FREQUENCY() is an unusual array function and it works differently to most other normal functions. It can not simply be typed into a cell or even entered properly using the Excel Function Wizard.

Note that this function does not analyse values into categories e.g. household expenditure into groups such as gas, electricity, water, rates etc. To perform this kind of analysis an Advanced Filter may be appropriate.

The frequency function has two arguments - the first is the range of cells containing values to be analysed; the second is the range of cells containing the upper values of each group banding. e.g. =FREQUENCY(A3:A120, B6:B10)

The second argument (the group upper limits) will exclude any values which exceed the highest category or banding. The function allows you to take account of this and extend the range of analysis to an additional category which contains all values that exceed the specified upper limit.

http://www.meadinkent.co.uk/xlfreq.htm

is there such a thing in python?

+3  A: 
import numpy
numpy.histogram( [ <data> ], [ <bins> ] )

Docs:

numpy.histogram(a, bins=10, range=None, normed=False, weights=None)

Compute the histogram of a set of data. Parameters:

a : array_like Input data. The histogram is computed over the flattened array.

bins : int or sequence of scalars, optional If bins is an int, it defines the number of equal-width bins in the given range (10, by default). If bins is a sequence, it defines the bin edges, including the rightmost edge, allowing for non-uniform bin widths.

range : (float, float), optional The lower and upper range of the bins. If not provided, range is simply (a.min(), a.max()). Values outside the range are ignored.

normed : bool, optional If False, the result will contain the number of samples in each bin. If True, the result is the value of the probability density function at the bin, normalized such that the integral over the range is 1. Note that the sum of the histogram values will not be equal to 1 unless bins of unity width are chosen; it is not a probability mass function.

weights : array_like, optional An array of weights, of the same shape as a. Each value in a only contributes its associated weight towards the bin count (instead of 1). If normed is True, the weights are normalized, so that the integral of the density over the range remains 1

Returns:

hist : array The values of the histogram. See normed and weights for a description of the possible semantics.

bin_edges : array of dtype float Return the bin edges (length(hist)+1).

You may have to install numpy first.

katrielalex
+1  A: 

based on what the referenced page http://www.meadinkent.co.uk/xlfreq.htm states i wrote a function i'm sure that there are faster ways to do it but i'm sure this one is correct

def FREQUENCY(values, bands, max=None):
   counts = [0]*(len(bands)+1)
   for v in values:
       for i,b in enumerate(bands):
           if v <= b:
               counts[i] += 1
               break
           else if v > max:
               counts[-1] += 1
               break
   return counts
Dan D
Excel's FREQUENCY returns an array that is one element longer than bands. This version does not.
Steven Rumbalski
+3  A: 

The best option is to use numpy.histogram, but if you don't want to install numpy, here is one that works just like Excel:

def frequency(data, bins):
    # work with local sorted copy of bins for performance
    bins = bins[:]
    bins.sort()
    freqs = [0] * (len(bins)+1)
    for item in data:
        for i, bin_val in enumerate(bins):
            if item <= bin_val:
                freqs[i] += 1
                break
        else:
            freqs[len(bins)] += 1
    return freqs

Here's the example in Excel's help translated to python:

>>> data = [79, 85, 78, 85, 50, 81, 95, 88, 97]
... bins = [70, 79, 89]
... print frequency(data, bins)
[1, 2, 4, 2]

There is one minor difference. In Excel, if bins is empty, the length of data is returned as an integer. This python version returns that integer in a list. The reason for this is that the Python version will return a consistent data type (and still give the correct answer).

Steven Rumbalski
+1  A: 

I don't know whether there is such function in Python, but obviously you can write it:

def frequency(values, groups):
    # Build the solution
    toret = dict()
    toret[ None ] = list()

    # Sort them
    values.sort()
    groups.sort()

    # Run over groups
    i = 0
    for maxValue in groups:
        while ( ( values[ i ] < maxValue ) and ( i < len( values ) ) ):
            if ( toret.get( maxValue ) == None ):
                toret[ maxValue ] = list()
            toret[ maxValue ].append( values[ i ] )
            i += 1

        if ( i >= len( values ) ):
            break

    if ( i < len( values ) ):
        while( i < len( values ) ):
            toret[ None ].append( values[ i ] )
            i += 1

    return toret


l=[ 15,9,3,5,6,4,8,2,1,7,11,12 ]
g=[ 3,6,9 ]
print( frequency( l, g ) )

The result here is a dictionary, in which each element is one of the max values in the groups list. You can find the frequency by computing the length of each list.

The result is:

{None: [9, 11, 12, 15], 9: [6, 7, 8], 3: [1, 2], 6: [3, 4, 5]}
Baltasarq
Why all the parentheses?
PreludeAndFugue
If you refer to the parenthesis in sentences such as: "print( frequency( l, g ) )", then they are due to the use of Python 3. IF you refer to them in "if ( i < len( values ) ):", then it is personal use.
Baltasarq