views:

191

answers:

5

Here's a common situation when compiling data in dictionaries from different sources:

Say you have a dictionary that stores lists of things, such as things I like:

likes = {
    'colors': ['blue','red','purple'],
    'foods': ['apples', 'oranges']
}

and a second dictionary with some related values in it:

favorites = {
    'colors':'yellow',
    'desserts':'ice cream'
}

You then want to iterate over the "favorites" object and either append the items in that object to the list with the appropriate key in the "likes" dictionary or add a new key to it with the value being a list containing the value in "favorites".

There are several ways to do this:

for key in favorites:
    if key in likes:
        likes[key].append(favorites[key])
    else:
        likes[key] = list(favorites[key])

or

for key in favorites:
    try:
        likes[key].append(favorites[key])
    except KeyError:
        likes[key] = list(favorites[key])

And many more as well...

I generally use the first syntax because it feels more pythonic, but if there are other, better ways, I'd love to know what they are. Thanks!

+5  A: 

Use collections.defaultdict, where the default value is a new list instance.

>>> import collections
>>> mydict = collections.defaultdict(list)

In this way calling .append(...) will always succeed, because in case of a non-existing key append will be called on a fresh empty list.

You can instantiate the defaultdict with a previously generated list, in case you get the dict likes from another source, like so:

>>> mydict = collections.defaultdict(list, likes)

Note that using list as the default_factory attribute of a defaultdict is also discussed as an example in the documentation.

Stephan202
2.5 and onwards only, or you have to define your own defaultdict class.
Gregg Lind
+3  A: 

Use collections.defaultdict:

import collections

likes = collections.defaultdict(list)

for key, value in favorites.items():
    likes[key].append(value)

defaultdict takes a single argument, a factory for creating values for unknown keys on demand. list is a such a function, it creates empty lists.

And iterating over .items() will save you from using the key to get the value.

Ned Batchelder
Good tip on using .items(). That's one of the things I love about Python. There's always a better, faster, smarter way.
Gabriel Hurley
+1  A: 
>>> from collections import defaultdict
>>> d = defaultdict(list, likes)
>>> d
defaultdict(<class 'list'>, {'colors': ['blue', 'red', 'purple'], 'foods': ['apples', 'oranges']})
>>> for i, j in favorites.items():
    d[i].append(j)

>>> d
defaultdict(<class 'list'>, {'desserts': ['ice cream'], 'colors': ['blue', 'red', 'purple', 'yellow'], 'foods': ['apples', 'oranges']})
SilentGhost
+2  A: 

Except defaultdict, the regular dict offers one possibility (that might look a bit strange): dict.setdefault(k[, d]):

for key, val in favorites.iteritems():
    likes.setdefault(key, []).append(val)

Thank you for the +20 in rep -- I went from 1989 to 2009 in 30 seconds. Let's remember it is 20 years since the Wall fell in Europe..

kaizer.se
Ah, nice. defaultdict seems like the "right" solution here, but that's a cool alternative.
Gabriel Hurley
Note that the first example at http://docs.python.org/3.1/library/collections.html#defaultdict-examples explicitly states that using a `defaultdict` is faster than using the `setdefault` method.
Stephan202
Also good to know. Thanks Stephan.
Gabriel Hurley
Another caveat is that the value must be mutable, this won't work: `likes.setdefault(key, 0) += 1`
kaizer.se
One more consideration for setdefault: it will evaluate both of its arguments, so even if the key exists, you will be creating and disposing the empty list anyway.
Ned Batchelder
setdefault was the way to do it before defaultdict. now it feels a bit awkward
gnibbler
Ned: I thought CPython had a cache of empty lists for reuse like this, which would speed it up. Can't find a reference now though.
kaizer.se
`print id([]), id([])`
kaizer.se
+1  A: 

All of the answers are defaultdict, but I'm not sure that's the best way to go about it. Giving out defaultdict to code that expects a dict can be bad. (See: http://stackoverflow.com/questions/3031817/how-do-i-make-a-defaultdict-safe-for-unexpecting-clients ) I'm personally torn on the matter. (I actually found this question looking for an answer to "which is better, dict.get() or defaultdict") Someone in the other thread said that you don't want a defaultdict if you don't want this behavior all the time, and that might be true. Maybe using defaultdict for the convenience is the wrong way to go about it. I think there are two needs being conflated here:

"I want a dict whose default values are empty lists." to which defaultdict(list) is the correct solution.

and

"I want to append to the list at this key if it exists and create a list if it does not exist." to which my_dict.get('foo', []) with append() is the answer.

What do you guys think?

Ademan