tags:

views:

256

answers:

5

I am wondering how to create forgiving dictionary (one that returns a default value if a KeyError is raised).

In the following code example I would get a KeyError; for example

a = {'one':1,'two':2}
print a['three']

In order not to get one I would 1. Have to catch the exeption or use get.

I would like to not to have to do that with my dictionary...

+19  A: 
import collections
a = collections.defaultdict(lambda: 3)
a.update({'one':1,'two':2})
print a['three']

emits 3 as required. You could also subclass dict yourself and override __missing__, but that doesn't make much sense when the defaultdict behavior (ignoring the exact missing key that's being looked up) suits you so well...

Edit ...unless, that is, you're worried about a growing by one entry each time you look up a missing key (which is part of defaultdict's semantics) and would rather get slower behavior but save some memory. For example, in terms of memory...:

>>> import sys
>>> a = collections.defaultdict(lambda: 'blah')
>>> print len(a), sys.getsizeof(a)
0 140
>>> for i in xrange(99): _ = a[i]
... 
>>> print len(a), sys.getsizeof(a)
99 6284

...the defaultdict, originally empty, now has the 99 previously-missing keys that we looked up, and takes 6284 bytes (vs. the 140 bytes it took when it was empty).

The alternative approach...:

>>> class mydict(dict):
...   def __missing__(self, key): return 3
... 
>>> a = mydict()
>>> print len(a), sys.getsizeof(a)
0 140
>>> for i in xrange(99): _ = a[i]
... 
>>> print len(a), sys.getsizeof(a)
0 140

...entirely saves this memory overhead, as you see. Of course, performance is another issue:

$ python -mtimeit -s'import collections; a=collections.defaultdict(int); r=xrange(99)' 'for i in r: _=a[i]'
100000 loops, best of 3: 14.9 usec per loop

$ python -mtimeit -s'class mydict(dict):
>   def __missing__(self, key): return 0
> ' -s'a=mydict(); r=xrange(99)' 'for i in r: _=a[i]'
10000 loops, best of 3: 92.9 usec per loop

Since defaultdict adds the (previously-missing) key on lookup, it gets much faster when such a key is next looked up, while mydict (which overrides __missing__ to avoid that addition) pays the "missing key lookup overhead" every time.

Whether you care about either issue (performance vs memory footprint) entirely depends on your specific use case, of course. It is in any case a good idea to be aware of the tradeoff!-)

Alex Martelli
Warning: defaultdict inserts a new item into itself whenever it returns the default value for a given key. That turns read operations into potential write operations, and means that looking up a lot of missing keys will cause it to grow quickly. http://docs.python.org/library/collections.html#collections.defaultdict.__missing__
Forest
@Forest, good point! Let me edit accordingly.
Alex Martelli
Excellent post! Your second to last paragraph does not seem to relate to your example since you never use the same key twice. So it seems that defaultdict is faster even if you never repeat a key and even faster if you do. Is that right?
Jeff
@Jeff, `timeit` _loops_ on the statement you're measuring, so **it** repeats that statement -- in this case, the `for i in r` loop.
Alex Martelli
+1 I never realized the performance/memory tradeoff of the 2 approaches. Thanks for enlightening me.
ma3
+7  A: 

New in version 2.5: If a subclass of dict defines a method __missing__(), if the key key is not present, the d[key] operation calls that method with the key key as argument. The d[key] operation then returns or raises whatever is returned or raised by the __missing__(key) call if the key is not present. No other operations or methods invoke __missing__(). If __missing__() is not defined, KeyError is raised. __missing__() must be a method; it cannot be an instance variable. For an example, see collections.defaultdict.

http://docs.python.org/library/stdtypes.html

NullUserException
+2  A: 

You'll probably want to use a defaultdict (it requires atleast python2.5 I believe)

from collections import defaultdict
def default(): return 'Default Value'
d = defaultdict(default)
print(d['?'])

The function that is passed to the constructor tells the class what to return as a default value. See the documentation for additional examples.

brennie
+3  A: 

Here is how to subclass dict as suggested by NullUserException

>>> class forgiving_dict(dict):
...     def __missing__(self, key):
...         return 3
...
>>> a = forgiving_dict()
>>> a.update({'one':1,'two':2})
>>> print a['three']
3

One big difference between this answer and Alex's is that the missing key is not added to the dictionary

>>> print a
{'two': 2, 'one': 1}

Which is quite significant if you expect a lot of misses

gnibbler
A: 

Sometimes what you really want is .setdefault() which is not very intuitive, but it's a method that "returns the key specified, if it doesn't exist, set that key to this value".

Here is an example of setdefault() being used to good effect:

collection = {}
for elem in mylist:
    key = key_from_elem(elem)
    collection.setdefault(key, []).append(elem)

This will allow us to create a dictionary like: {'key1':[elem1, elem3], 'key2':[elem3]} without having to have an ugly check to see if there's a key already there and creating a list for it.

Jerub