ansaurus

Question

Answer 1

+19 A:

import collections
a = collections.defaultdict(lambda: 3)
a.update({'one':1,'two':2})
print a['three']

emits 3 as required. You could also subclass dict yourself and override __missing__, but that doesn't make much sense when the defaultdict behavior (ignoring the exact missing key that's being looked up) suits you so well...

Edit ...unless, that is, you're worried about a growing by one entry each time you look up a missing key (which is part of defaultdict's semantics) and would rather get slower behavior but save some memory. For example, in terms of memory...:

>>> import sys
>>> a = collections.defaultdict(lambda: 'blah')
>>> print len(a), sys.getsizeof(a)
0 140
>>> for i in xrange(99): _ = a[i]
... 
>>> print len(a), sys.getsizeof(a)
99 6284

...the defaultdict, originally empty, now has the 99 previously-missing keys that we looked up, and takes 6284 bytes (vs. the 140 bytes it took when it was empty).

The alternative approach...:

>>> class mydict(dict):
...   def __missing__(self, key): return 3
... 
>>> a = mydict()
>>> print len(a), sys.getsizeof(a)
0 140
>>> for i in xrange(99): _ = a[i]
... 
>>> print len(a), sys.getsizeof(a)
0 140

...entirely saves this memory overhead, as you see. Of course, performance is another issue:

$ python -mtimeit -s'import collections; a=collections.defaultdict(int); r=xrange(99)' 'for i in r: _=a[i]'
100000 loops, best of 3: 14.9 usec per loop

$ python -mtimeit -s'class mydict(dict):
>   def __missing__(self, key): return 0
> ' -s'a=mydict(); r=xrange(99)' 'for i in r: _=a[i]'
10000 loops, best of 3: 92.9 usec per loop

Since defaultdict adds the (previously-missing) key on lookup, it gets much faster when such a key is next looked up, while mydict (which overrides __missing__ to avoid that addition) pays the "missing key lookup overhead" every time.

Whether you care about either issue (performance vs memory footprint) entirely depends on your specific use case, of course. It is in any case a good idea to be aware of the tradeoff!-)

Alex Martelli 2010-07-29 00:22:05

Warning: defaultdict inserts a new item into itself whenever it returns the default value for a given key. That turns read operations into potential write operations, and means that looking up a lot of missing keys will cause it to grow quickly. http://docs.python.org/library/collections.html#collections.defaultdict.__missing__

Forest 2010-07-29 01:25:57

@Forest, good point! Let me edit accordingly.

Alex Martelli 2010-07-29 01:30:57

Excellent post! Your second to last paragraph does not seem to relate to your example since you never use the same key twice. So it seems that defaultdict is faster even if you never repeat a key and even faster if you do. Is that right?

Jeff 2010-07-29 13:50:53

@Jeff, `timeit` _loops_ on the statement you're measuring, so **it** repeats that statement -- in this case, the `for i in r` loop.

Alex Martelli 2010-07-29 15:35:09

+1 I never realized the performance/memory tradeoff of the 2 approaches. Thanks for enlightening me.

ma3 2010-10-14 04:58:13

Answer 2

+7 A:

New in version 2.5: If a subclass of dict defines a method __missing__(), if the key key is not present, the d[key] operation calls that method with the key key as argument. The d[key] operation then returns or raises whatever is returned or raised by the __missing__(key) call if the key is not present. No other operations or methods invoke __missing__(). If __missing__() is not defined, KeyError is raised. __missing__() must be a method; it cannot be an instance variable. For an example, see collections.defaultdict.

http://docs.python.org/library/stdtypes.html

NullUserException 2010-07-29 00:22:07

Answer 3

+2 A:

You'll probably want to use a defaultdict (it requires atleast python2.5 I believe)

from collections import defaultdict
def default(): return 'Default Value'
d = defaultdict(default)
print(d['?'])

The function that is passed to the constructor tells the class what to return as a default value. See the documentation for additional examples.

brennie 2010-07-29 00:25:42

Answer 4

+3 A:

Here is how to subclass dict as suggested by NullUserException

>>> class forgiving_dict(dict):
...     def __missing__(self, key):
...         return 3
...
>>> a = forgiving_dict()
>>> a.update({'one':1,'two':2})
>>> print a['three']
3

One big difference between this answer and Alex's is that the missing key is not added to the dictionary

>>> print a
{'two': 2, 'one': 1}

Which is quite significant if you expect a lot of misses

gnibbler 2010-07-29 01:12:20

Answer 5

A:

Sometimes what you really want is .setdefault() which is not very intuitive, but it's a method that "returns the key specified, if it doesn't exist, set that key to this value".

Here is an example of setdefault() being used to good effect:

collection = {}
for elem in mylist:
    key = key_from_elem(elem)
    collection.setdefault(key, []).append(elem)

This will allow us to create a dictionary like: {'key1':[elem1, elem3], 'key2':[elem3]} without having to have an ugly check to see if there's a key already there and creating a list for it.

Jerub 2010-07-29 03:05:04

ansaurus

tags:

views:

answers:

A forgiving dictionary

related questions