ansaurus

Question

Add elements in a list of dictionaries

Answer 1

+1 A:

Use collections.defaultdict instead.

http://docs.python.org/library/collections.html#defaultdict-objects

Slightly simpler.

S.Lott 2010-07-29 19:27:37

`update` overwrites values

katrielalex 2010-07-29 19:31:51

Answer 2

+1 A:

This could be fast too, but it really depends on your data. It avoids all the changing dicts or extra lists - just one set of all keys and lots of reads :-)

from itertools import chain

def union( dict_list ):
    all_keys = set(chain.from_iterable(dict_list))
    for key in all_keys:
        yield key, sum( d.get(key,0) for d in dict_list)

combined = dict(union( dict_list ))

THC4k 2010-07-29 19:35:37

Although this uses more sophisticated functions, I cannot imagine that this will be faster (but I could be wrong). In the OP's code, the list of dictionaries is only traversed once, so is every dictionary. In your code, every dictionary gets traversed once to create the set of keys and then the list of dicts gets traversed `#all_keys` times.

Felix Kling 2010-07-29 19:48:09

Felix Kling: Well I just tried, when i add a iterator (see revisions ;-) to traverse only once it gets even slower. Guess that extra hasing from the set it the problem.

THC4k 2010-07-30 01:42:29

Answer 3

A:

You could take some inspiration from google's map-reduce. From what I understand it was designed to solve just this type of problem.

Swizec Teller 2010-07-29 19:44:05

Answer 4

+9 A:

Here are some microbenchmarks which suggest f2 (see below) might be an improvement. f2 uses iteritems which allows you avoid an extra dict lookup in the inner loop:

import collections
import string
import random

def random_dict():
    n=random.randint(1,26)
    keys=list(string.letters)
    random.shuffle(keys)
    keys=keys[:n]
    values=[random.randint(1,100) for _ in range(n)]    
    return dict(zip(keys,values))

list_dictionaries=[random_dict() for x in xrange(100)]

def f1(list_dictionaries):
    comb_dict = {}  
    for dictionary in list_dictionaries:  
        for key in dictionary:  
            comb_dict.setdefault(key, 0)  
            comb_dict[key] += dictionary[key]  
    return comb_dict

def f2(list_dictionaries):    
    comb_dict = collections.defaultdict(int)
    for dictionary in list_dictionaries:  
        for key,value in dictionary.iteritems():  
            comb_dict[key] += value
    return comb_dict

def union( dict_list ):
    all_keys = set()
    for d in dict_list:
        for k in d:
            all_keys.add( k )
    for key in all_keys:
        yield key, sum( d.get(key,0) for d in dict_list)

def f3(list_dictionaries):
    return dict(union( list_dictionaries ))

Here are the results:

% python -mtimeit -s"import test" "test.f1(test.list_dictionaries)"
1000 loops, best of 3: 776 usec per loop
% python -mtimeit -s"import test" "test.f2(test.list_dictionaries)"
1000 loops, best of 3: 432 usec per loop    
% python -mtimeit -s"import test" "test.f3(test.list_dictionaries)"
100 loops, best of 3: 2.19 msec per loop

unutbu 2010-07-29 19:47:36

Thanks! f2() actually cut about 80% of the time out for my particular application. YRMV, obviously.

thebackhand 2010-07-29 21:18:11

ansaurus

tags:

views:

answers:

Add elements in a list of dictionaries

related questions