views:

4449

answers:

4

I have a range of dates and a measurement on each of those dates. I'd like to calculate an exponential moving average for each of the dates. Does anybody know how to do this?

I'm new to python. It doesn't appear that averages are built into the standard python library, which strikes me as a little odd. Maybe I'm not looking in the right place.

So, given the following code, how could I calculate the moving weighted average of IQ points for calendar dates?

from datetime import date
days = [date(2008,1,1), date(2008,1,2), date(2008,1,7)]
IQ = [110, 105, 90]

(there's probably a better way to structure the data, any advice would be appreciated)

+1  A: 

I don't know Python, but for the averaging part, do you mean an exponentially decaying low-pass filter of the form

y_new = y_old + (input - y_old)*alpha

where alpha = dt/tau, dt = the timestep of the filter, tau = the time constant of the filter? (the variable-timestep form of this is as follows, just clip dt/tau to not be more than 1.0)

y_new = y_old + (input - y_old)*dt/tau

If you want to filter something like a date, make sure you convert to a floating-point quantity like # of seconds since Jan 1 1970.

Jason S
+7  A: 

I did a bit of googling and I found the following sample code (http://osdir.com/ml/python.matplotlib.general/2005-04/msg00044.html):

def ema(s, n):
"""
returns an n period exponential moving average for
the time series s

s is a list ordered from oldest (index 0) to most
recent (index -1)
n is an integer

returns a numeric array of the exponential
moving average
"""
s = array(s)
ema = []
j = 1

#get n sma first and calculate the next n period ema
sma = sum(s[:n]) / n
multiplier = 2 / float(1 + n)
ema.append(sma)

#EMA(current) = ( (Price(current) - EMA(prev) ) x Multiplier) + EMA(prev)
ema.append(( (s[n] - sma) * multiplier) + sma)

#now calculate the rest of the values
for i in s[n+1:]:
   tmp = ( (i - ema[j]) * multiplier) + ema[j]
   j = j + 1
   ema.append(tmp)

return ema
earino
Please fix the indentation of the function block
bgbg
+1  A: 

My python is a little bit rusty (anyone can feel free to edit this code to make corrections, if I've messed up the syntax somehow), but here goes....

def movingAverageExponential(values, alpha, epsilon = 0):

   if not 0 < alpha < 1:
      raise ValueError("out of range, alpha='%s'" % alpha)

   if not 0 <= epsilon < alpha:
      raise ValueError("out of range, epsilon='%s'" % epsilon)

   result = [None] * len(values)

   for i in range(len(result)):
       currentWeight = 1.0

       numerator     = 0
       denominator   = 0
       for value in values[i::-1]:
           numerator     += value * currentWeight
           denominator   += currentWeight

           currentWeight *= alpha
           if currentWeight < epsilon: 
              break

       result[i] = numerator / denominator

   return result

This function moves backward, from the end of the list to the beginning, calculating the exponential moving average for each value by working backward until the weight coefficient for an element is less than the given epsilon.

At the end of the function, it reverses the values before returning the list (so that they're in the correct order for the caller).

(SIDE NOTE: if I was using a language other than python, I'd create a full-size empty array first and then fill it backwards-order, so that I wouldn't have to reverse it at the end. But I don't think you can declare a big empty array in python. And in python lists, appending is much less expensive than prepending, which is why I built the list in reverse order. Please correct me if I'm wrong.)

The 'alpha' argument is the decay factor on each iteration. For example, if you used an alpha of 0.5, then today's moving average value would be composed of the following weighted values:

today:        1.0
yesterday:    0.5
2 days ago:   0.25
3 days ago:   0.125
...etc...

Of course, if you've got a huge array of values, the values from ten or fifteen days ago won't contribute very much to today's weighted average. The 'epsilon' argument lets you set a cutoff point, below which you will cease to care about old values (since their contribution to today's value will be insignificant).

You'd invoke the function something like this:

result = movingAverageExponential(values, 0.75, 0.0001)
benjismith
How do you apply it to the non-continues data when it is available at non-uniform time intervals e.g., an in the question: today, 5 days ago, 6 days ago?
J.F. Sebastian
J.F. Sebastian
Conditions could be written as follows: if not 0 <= alpha <= 1: raise ValueError("out of range, expected 0..1 get: '%s'" % alpha)
J.F. Sebastian
Your algorithm is quadratic when (alpha==1 or epsilon==0). M=log(epsilon)/log(alpha) could be a large factor (number of time the internal loop is executed if len(values) is large), so I would not worry about `values.reverse()` -- it is just one more pass over the data.
J.F. Sebastian
There are algorithms that allows to compute AWME in one pass (see `ema()` from @earino's answer and `mov_average_expw()` from mine.
J.F. Sebastian
I'll fix the syntax errors a little later tonight or tomorrow (gotta go now). But on the subject of linear/quadratic time, I've always implemented SMA with a linear algorithm but never EMA, since you get different results by using SMA for the 1st N elements. And what if you only have alpha but no N?
benjismith
With respect to non-continuous data at non-uniform time intervals, I don't think it's the job of this function to handle those cases. A separate wrapper class could provide interpolation services on top of this data.
benjismith
I've fixed syntax errors. Hope, you don't mind..
J.F. Sebastian
+8  A: 

EDIT: It seems that mov_average_expw() function from scikits.timeseries.lib.moving_funcs submodule from SciKits (add-on toolkits that complement SciPy) better suits the wording of your question.


To calculate an exponential smoothing of your data with a smoothing factor alpha (it is (1 - alpha) in Wikipedia's terms):

>>> alpha = 0.5
>>> assert 0 < alpha <= 1.0
>>> av = sum(alpha**n.days * iq 
...     for n, iq in map(lambda (day, iq), today=max(days): (today-day, iq), 
...         sorted(zip(days, IQ), key=lambda p: p[0], reverse=True)))
95.0

The above is not pretty, so let's refactor it a bit:

from collections import namedtuple
from operator    import itemgetter

def smooth(iq_data, alpha=1, today=None):
    """Perform exponential smoothing with factor `alpha`.

    Time period is a day.
    Each time period the value of `iq` drops `alpha` times.
    The most recent data is the most valuable one.
    """
    assert 0 < alpha <= 1

    if alpha == 1: # no smoothing
        return sum(map(itemgetter(1), iq_data))

    if today is None:
        today = max(map(itemgetter(0), iq_data))

    return sum(alpha**((today - date).days) * iq for date, iq in iq_data)

IQData = namedtuple("IQData", "date iq")

if __name__ == "__main__":
    from datetime import date

    days = [date(2008,1,1), date(2008,1,2), date(2008,1,7)]
    IQ = [110, 105, 90]
    iqdata = list(map(IQData, days, IQ))
    print("\n".join(map(str, iqdata)))

    print(smooth(iqdata, alpha=0.5))

Example:

$ python26 smooth.py
IQData(date=datetime.date(2008, 1, 1), iq=110)
IQData(date=datetime.date(2008, 1, 2), iq=105)
IQData(date=datetime.date(2008, 1, 7), iq=90)
95.0
J.F. Sebastian