views:

99

answers:

3

I'm trying to find the fastest/most efficient way to extract the average value from a dict. The task I'm working on requires that it do this thousands of times, so simply iterating over all the values in the dict each time to find the average would be entirely inefficient. Hundreds and hundreds of new key,value pairs get added to the dict and we need to find the average value each time this occurs. We also need to find the new average value each time a value gets updated, which occurs thousands of times.

Thanks in advance--this is such an awesome place.

+1  A: 

Inherit from dict and calculate the average value each time __setitem__ is called.

Since you can store the previous average in your dictionary class and only average this and the new value that is added, that should be pretty fast - the first time a new item is added, the average value is simply that of this value.

Jim Brissom
+1  A: 

The following is based on running average, so if you know the previous average:

At = (A0 * N + E) / (N + 1)

At is the average after addition of the new element
A0 is the average before addition of the new element
N is the number of element before addition of the new element
E is the new element's value

Its simpler brother works if you keep tab of the sum of the elements:

At = (T + E) / (N + 1)

T is the total of all elements
A0 is the average before addition of the new element
N is the number of element before addition of the new element
E is the new element's value

When a value is deleted, you can do a similar thing:

At = (A0 * N - E) / (N - 1)

And when a value is updated:

At = (A0 * N - E0 + E1) / (N)

E0 is value before updating, E1 is value after updating.
Lie Ryan
+8  A: 

Create your own dict subclass that tracks the count and total, and then can quickly return the average:

class AvgDict(dict):
    def __init__(self):
        self._total = 0.0
        self._count = 0

    def __setitem__(self, k, v):
        if k in self:
            self._total -= self[k]
            self._count -= 1
        dict.__setitem__(self, k, v)
        self._total += v
        self._count += 1

    def __delitem__(self, k):
        v = self[k]
        dict.__delitem__(self, k)
        self._total -= v
        self._count -= 1

    def average(self):
        if self._count:
            return self._total/self._count

a = AvgDict()
assert a.average() is None
a[1] = 1
assert a.average() == 1
a[2] = 10
assert a.average() == 5.5
assert a[2] == 10
a[1] = 5
assert a.average() == 7.5
del a[1]
assert a.average() == 10
Ned Batchelder
Don't you need to override `__delitem__` too?
Wallacoloo
Perhaps not, since I'm not actually deleting any values--just updating them.
Georgina
Oops, I had overlooked `__delitem__`, I'll add it for completeness.
Ned Batchelder
you don't need to keep track of self._count yourself, python's len() already keep track of that for you.
Lie Ryan
@Lie: good point!
Ned Batchelder