views:

79

answers:

3

I've refactored how the merged-dictionary (all_classes) below is created, but I'm wondering if it can be more efficient.

I have a dictionary of dictionaries, like this:

groups_and_classes = {'group_1': {'class_A': [1, 2, 3],
                                  'class_B': [1, 3, 5, 7], 
                                  'class_c': [1, 2], # ...many more items like this
                                 },
                      'group_2': {'class_A': [11, 12, 13],
                                  'class_C': [5, 6, 7, 8, 9]
                                 }, # ...and many more items like this
                     }

A function creates a new object from groups_and_classes like this (the function to create this is called often):

all_classes = {'class_A': [1, 2, 3, 11, 12, 13],
               'class_B': [1, 3, 5, 7, 9],
               'class_C': [1, 2, 5, 6, 7, 8, 9]
              }

Right now, there is a loop that does this:

all_classes = {}
for group in groups_and_classes.values():
    for c, vals in group.iteritems():
        for v in vals:
            if all_classes.has_key(c):
                if v not in all_classes[c]:
                    all_classes[c].append(v)
            else:
                all_classes[c] = [v]

So far, I changed the code to use a set instead of a list since the order of the list doesn't matter and the values need to be unique:

all_classes = {}
for group in groups_and_classes.values():
    for c, vals in group.iteritems():
        try:
            all_classes[c].update(set(vals))
        except KeyError:
            all_classes[c] = set(vals)

This is a little nicer, and I didn't have to convert the sets to lists because of how all_classes is used in the code.

Question: Is there a more efficient way of creating all_classes (aside from building it at the same time groups_and_classes is built, and changing everywhere this function is called)?

+4  A: 

Here's a tweak for conciseness, though I'm not sure about performance:

from collections import defaultdict
all_classes = defaultdict(set)
for group in groups_and_classes.values():
    for c, vals in group.iteritems():
        all_classes[c].update(set(vals))

Defaultdicts are not quite the greatest thing since sliced bread, but they're pretty cool. :)

Vicki Laidler
The problem with defaultdicts IMO is that whenever you reference a key that isn't in it, that key is added to the dict even if you were just getting the value associated with that key (empty set in this case) rather than setting the key to a value.
Justin Peel
Combining @Brian's suggestion below with this solution, it is the fastest out of the 3 (mine and The Machine Charmer being the other 2)
jcoon
@justin - good point. In my use case, I convert the defaultdict to a dict, so that it won't cause problems.
jcoon
+2  A: 

One thing that might improve things slightly is to avoid the redundant conversion to a set, and just use:

all_classes[c].update(vals)

update can actually take an arbitrary iterable, as it essentially just iterates and adds, so you can avoid an extra conversion step.

Brian
Good tip - I didn't know that
jcoon
+2  A: 

Combining Dictionaries Of Lists In Python.

def merge_dols(dol1, dol2):
    result = dict(dol1, **dol2)
    result.update((k, dol1[k] + dol2[k]) for k in set(dol1).intersection(dol2))
    return result

g1 = groups_and_classes['group_1']
g2 = groups_and_classes['group_2']

all_classes = merge_dols(g1,g2)

OR

all_classes = reduce(merge_dols,groups_and_classes.values())

--copied from Alex Martelli

If you get more than two groups then you can use itertools.reduce

all_classes = reduce(merge_dols,groups_and_classes.values())
TheMachineCharmer