ansaurus

Question

How to create single Python dict from a list of dicts by summing values with common keys?

Answer 1

+9 A:

If all dicts have all keys, you could do this as:

>>> dict((key, sum(d[key] for d in dictList)) for key in dictList[0])
{'a': 22, 'b': 56, 'c': 191}

[Edit] If speed is a big priority, you can also shave off ~20% (though at the cost of some readability) with the following instead:

import operator, itertools
dict((key, sum(itertools.imap(operator.itemgetter(key), dictList))) 
      for key in dictList[0])

The speed depends on the size of the dict. I get the following timings for the original 3 item list, and for various different sizes (created by mutliplying the original list by 10, 100 or 1000 etc):

List Size   Original      dict+generator       imap+itemgetter
      3      0.054          0.090                0.097
     30      0.473          0.255                0.236
    300      4.668          1.884                1.529
   3000     46.668         17.975               14.499

(All times for 10,000 runs)

So it's slightly slower for just 3, but two to three times as fast for larger lists.

Brian 2009-06-10 10:06:16

+1 and if they don't have all keys: dict((key, sum(d.get(key, 0) for d in dictList)) for key in dictList[0]))

Nadia Alramli 2009-06-10 10:10:13

@Nadia: both Brian's answer and your comment give the same code

paffnucy 2009-06-10 10:12:56

@paffnucy: no, they don't/

SilentGhost 2009-06-10 10:14:06

@paffnucy, I'm using `get` it will return 0 if the key doesn't exist

Nadia Alramli 2009-06-10 10:17:28

+1 sorry, something wrong with my eyes :)

paffnucy 2009-06-10 10:19:11

-1. That's slower than the original. At least on my machine, using the example input.

kotlinski 2009-06-10 10:20:06

@Nadia: In that case you also need a complete list of available keys - if dictList[0] is missing some, you won't get a complete result.

Brian 2009-06-10 10:20:19

@kotlinski: Timing depends on the size of data you're processing. This is actually much faster for larger lists, though it might have a larger constant factor cost for small lists. See the timing figures I've given

Brian 2009-06-10 10:35:48

@Brian: +1 then ;)

kotlinski 2009-06-10 10:39:43

Where does the speed gain come from? It seems to me that you are both running once over every key of every dictionary, and all you've done is switch the order of the loops...

Jaime 2009-06-10 11:01:16

I'd guess just from certain things being faster / slower. There's no algorithmic improvement, since the ratios are pretty constant beyond a certain size, so I suspect it's simply a constant factor gain, but a larger setup cost meaning it does worse on small lists.

Brian 2009-06-10 11:05:00

@Jamie: speed gain comes from single application of sum function over a numerous summation at each iteration. additionally, dict is built only once with, and there is no need to "re-assign" its items' values at each iteration.

SilentGhost 2009-06-10 11:07:45

Answer 2

+4 A:

Try this.

from collections import defaultdict
result = defaultdict(int)
for myDict in dictList:
    for k in myDict:
        result[k] += myDict[k]

S.Lott 2009-06-10 10:52:58

+1 for learning me yet-another trick.

NicDumZ 2009-06-11 01:39:37

ansaurus

tags:

views:

answers:

How to create single Python dict from a list of dicts by summing values with common keys?

related questions