tags:

views:

453

answers:

2

I have a list of dictionaries, e.g:

dictList = [
    {'a':3, 'b':9, 'c':4},
    {'a':9, 'b':24, 'c':99},
    {'a':10, 'b':23, 'c':88}
]

All the dictionaries have the same keys e.g. a, b, c. I wish to create a single dictionary with the same keys where the values are the sums of the values with the same keys from all the dictionaries in the original list.

So for the above example, the output should be:

{'a':22, 'b':56, 'c':191}

What would be the most efficient way of doing this? I currently have:

result = {}
for myDict in dictList:
    for k in myDict:
        result[k] = result.setdefault(k, 0) + myDict[k]
+9  A: 

If all dicts have all keys, you could do this as:

>>> dict((key, sum(d[key] for d in dictList)) for key in dictList[0])
{'a': 22, 'b': 56, 'c': 191}

[Edit] If speed is a big priority, you can also shave off ~20% (though at the cost of some readability) with the following instead:

import operator, itertools
dict((key, sum(itertools.imap(operator.itemgetter(key), dictList))) 
      for key in dictList[0])

The speed depends on the size of the dict. I get the following timings for the original 3 item list, and for various different sizes (created by mutliplying the original list by 10, 100 or 1000 etc):

List Size   Original      dict+generator       imap+itemgetter
      3      0.054          0.090                0.097
     30      0.473          0.255                0.236
    300      4.668          1.884                1.529
   3000     46.668         17.975               14.499

(All times for 10,000 runs)

So it's slightly slower for just 3, but two to three times as fast for larger lists.

Brian
+1 and if they don't have all keys: dict((key, sum(d.get(key, 0) for d in dictList)) for key in dictList[0]))
Nadia Alramli
@Nadia: both Brian's answer and your comment give the same code
paffnucy
@paffnucy: no, they don't/
SilentGhost
@paffnucy, I'm using `get` it will return 0 if the key doesn't exist
Nadia Alramli
+1 sorry, something wrong with my eyes :)
paffnucy
-1. That's slower than the original. At least on my machine, using the example input.
kotlinski
@Nadia: In that case you also need a complete list of available keys - if dictList[0] is missing some, you won't get a complete result.
Brian
@kotlinski: Timing depends on the size of data you're processing. This is actually much faster for larger lists, though it might have a larger constant factor cost for small lists. See the timing figures I've given
Brian
@Brian: +1 then ;)
kotlinski
Where does the speed gain come from? It seems to me that you are both running once over every key of every dictionary, and all you've done is switch the order of the loops...
Jaime
I'd guess just from certain things being faster / slower. There's no algorithmic improvement, since the ratios are pretty constant beyond a certain size, so I suspect it's simply a constant factor gain, but a larger setup cost meaning it does worse on small lists.
Brian
@Jamie: speed gain comes from single application of sum function over a numerous summation at each iteration. additionally, dict is built only once with, and there is no need to "re-assign" its items' values at each iteration.
SilentGhost
+4  A: 

Try this.

from collections import defaultdict
result = defaultdict(int)
for myDict in dictList:
    for k in myDict:
        result[k] += myDict[k]
S.Lott
+1 for learning me yet-another trick.
NicDumZ