views:

824

answers:

3

Is there a pythonic way to build up a list that contains a running average of some function?

After reading a fun little piece about Martians, black boxes, and the Cauchy distribution, I thought it would be fun to calculate a running average of the Cauchy distribution myself:

import math 
import random

def cauchy(location, scale):
    p = 0.0
    while p == 0.0:
        p = random.random()
    return location + scale*math.tan(math.pi*(p - 0.5))

# is this next block of code a good way to populate running_avg?
sum = 0
count = 0
max = 10
running_avg = []
while count < max:
    num = cauchy(3,1)
    sum += num
    count += 1
    running_avg.append(sum/count)

print running_avg     # or do something else with it, besides printing

I think that this approach works, but I'm curious if there might be a more elegant approach to building up that running_avg list than using loops and counters (e.g. list comprehensions).

There are some related questions, but they address more complicated problems (small window size, exponential weighting) or aren't specific to Python:

+10  A: 

You could write a generator:

def running_average():
  sum = 0
  count = 0
  while True:
    sum += cauchy(3,1)
    count += 1
    yield sum/count

Or, given a generator for Cauchy numbers and a utility function for a running sum generator, you can have a neat generator expression:

# Cauchy numbers generator
def cauchy_numbers():
  while True:
    yield cauchy(3,1)

# running sum utility function
def running_sum(iterable):
  sum = 0
  for x in iterable:
    sum += x
    yield sum

# Running averages generator expression (** the neat part **)
running_avgs = (sum/(i+1) for (i,sum) in enumerate(running_sum(cauchy_numbers())))

# goes on forever
for avg in running_avgs:
  print avg

# alternatively, take just the first 10
import itertools
for avg in itertools.islice(running_avgs, 10):
  print avg
orip
Awesome. Just to be clear, would your first example be used like: running_avg = [running_average().next() for i in range(10)] ?
Nate Kohl
Yes, you could use it like that, or you could use `itertools.islice` like in the second example: `for avg in itertools.islice(running_average(), 10):`
orip
Neat use of generators in this solution, however because of how many you have going on it seems to be ~ 2x slower than a simpler LC solution, although that may just be a trade off for yours being able to handle generators where the LC solution requires a list.
Bryan McLemore
I timed the simpler LC solution *including generating the Cauchy numbers* (which is what these 2 generator solutions do), and got slower numbers than the generators - see my comment below.
orip
I did, and you're right about the shift there in timing due to how I removed that portion from the timings, but didn't think that it'd be in yours. However, check out the new solution I just posted.
Bryan McLemore
+3  A: 

I've got two possible solutions here for you. Both are just generic running average functions that work on any list of numbers. (could be made to work with any iterable)

Generator based:

nums = [cauchy(3,1) for x in xrange(10)]

def running_avg(numbers):
    for count in xrange(1, len(nums)+1):
        yield sum(numbers[:count])/count

print list(running_avg(nums))

List Comprehension based (really the same code as the earlier):

nums = [cauchy(3,1) for x in xrange(10)]

print [sum(nums[:count])/count for count in xrange(1, len(nums)+1)]

Generator-compatabile Generator based:

Edit: This one I just tested to see if I could make my solution compatible with generators easily and what it's performance would be. This is what I came up with.

def running_avg(numbers):
    sum = 0
    for count, number in enumerate(numbers):
        sum += number
        yield sum/(count+1)

See the performance stats below, well worth it.

Performance characteristics:

Edit: I also decided to test Orip's interesting use of multiple generators to see the impact on performance.

Using timeit and the following (1,000,000 iterations 3 times):

print "Generator based:", ', '.join(str(x) for x in Timer('list(running_avg(nums))', 'from __main__ import nums, running_avg').repeat())
print "LC based:", ', '.join(str(x) for x in Timer('[sum(nums[:count])/count for count in xrange(1, len(nums)+1)]', 'from __main__ import nums').repeat())
print "Orip's:", ', '.join(str(x) for x in Timer('list(itertools.islice(running_avgs, 10))', 'from __main__ import itertools, running_avgs').repeat())

print "Generator-compatabile Generator based:", ', '.join(str(x) for x in Timer('list(running_avg(nums))', 'from __main__ import nums, running_avg').repeat())

I get the following results:

Generator based: 17.653908968, 17.8027219772, 18.0342400074
LC based: 14.3925321102, 14.4613749981, 14.4277560711
Orip's: 30.8035550117, 30.3142540455, 30.5146529675

Generator-compatabile Generator based: 3.55352187157, 3.54164409637, 3.59098005295

See comments for code:

Orip's genEx based: 4.31488609314, 4.29926609993, 4.30518198013

Results are in seconds, and show the LC new generator-compatible generator method to be consistently faster, your results may vary though. I expect the massive difference between my original generator and the new one is the fact that the sum isn't calculated on the fly.

Bryan McLemore
Interesting. Any idea (performance-wise) how these approaches compare to orip's first generator example?
Nate Kohl
Did you re-generate the Cauchy numbers each time, like the solution above? If not, you're timing the generation of the numbers as well as the running average.
orip
The timings I get: your LC solution (with generating the numbers each time): [16.687758600203807, 16.715932782820914, 16.738767166880578], my first generator solution: [14.070051607753044, 14.052854863427882, 14.081863001340764], my generator expression: [15.121694400936235, 15.14989374874375, 15.192127309105331]
orip
No Orip mine doesn't include the time for calculating the numbers each time except on when I tested yours, so that'd play a role into your 2x slower I'm sure. Although the stats on the new solution I just posted are showing a significant improvement still.
Bryan McLemore
You're still not generating the numbers. I timed just the numbers generation on my machine (no running sum), and it's clearly what's taking the most time: [9.3933679211354502, 9.4039924096689447, 9.3787265585455089]
orip
For the summing, your solution with `enumerate` instead of `count+=1` should be faster than the simple generator I showed
orip
No I'm not, as that's either going to be constant, or noise for the purpose of a running average.
Bryan McLemore
In order to get a good comparison with the generator-expression solution, run `list((sum/(i+1) for (i,sum) in enumerate(running_sum(nums))))` (nums instead of cauchy_numbers())
orip
Orip's genEx based: 4.31488609314, 4.29926609993, 4.30518198013
Bryan McLemore
+3  A: 

You could use coroutines. They are similar to generators, but allows you to send in values. Coroutines was added in Python 2.5, so this won't work in versions before that.

def running_average():
    sum = 0.0
    count = 0
    value = yield(float('nan'))
    while True:
        sum += value
        count += 1
        value = yield(sum/count)

ravg = running_average()
ravg.next()   # advance the corutine to the first yield

for i in xrange(10):
    avg = ravg.send(cauchy(3,1))
    print 'Running average: %.6f' % (avg,)

As a list comprehension:

ravg = running_average()
ravg.next()
ravg_list = [ravg.send(cauchy(3,1)) for i in xrange(10)]
MizardX
Wow, that's quite flexible. I had no idea that you could use yield to send stuff _into_ a function.
Nate Kohl
+1, completely awesome
orip