views:

158

answers:

5

In one of my classes I have a number of methods that all draw values from the same dictionaries. However, if one of the methods tries to access a value that isn't there, it has to call another method to make the value associated with that key.

I currently have this implemented as follows, where findCrackDepth(tonnage) assigns a value to self.lowCrackDepth[tonnage].

if tonnage not in self.lowCrackDepth:
    self.findCrackDepth(tonnage)
lcrack = self.lowCrackDepth[tonnage]

However, it would also be possible for me to do this as

try:
    lcrack = self.lowCrackDepth[tonnage]
except KeyError:
    self.findCrackDepth(tonnage)
    lcrack = self.lowCrackDepth[tonnage]

I assume there is a performance difference between the two related to how often the values is already in the dictionary. How big is this difference? I'm generating a few million such values (spread across a many dictionaries in many instances of the class), and for each time the value doesn't exist, there are probably two times where it does.

+1  A: 

If it is exceptional, use an exception. If you expect the key to be in there, use try/except, if you don't know whether the key is in there, use not in.

Sjoerd
+3  A: 

Checking if a key exists is cheaper or at least as cheap as retrieving it. So use the if not in solution which is much cleaner and more readable.

According to your question a key not existing is not an error-like case so there's no good reason to let python raise an exception (even though you catch it immediately), and if you have a if not in check, everyone knows your intention - to get the existing value or otherwise generate it.

ThiefMaster
+2  A: 

When in doubt, profile.

Run a test to see if, in your environment, one runs faster than another.

Oddthinking
+12  A: 

It's a delicate problem to time this because you need care to avoid "lasting side effects" and the performance tradeoff depends on the % of missing keys. So, consider a dil.py file as follows:

def make(percentmissing):
  global d
  d = dict.fromkeys(range(100-percentmissing), 1)

def addit(d, k):
  d[k] = k

def with_in():
  dc = d.copy()
  for k in range(100):
    if k not in dc:
      addit(dc, k)
    lc = dc[k]

def with_ex():
  dc = d.copy()
  for k in range(100):
    try: lc = dc[k]
    except KeyError:
      addit(dc, k)
      lc = dc[k]

def with_ge():
  dc = d.copy()
  for k in range(100):
    lc = dc.get(k)
    if lc is None:
      addit(dc, k)
      lc = dc[k]

and a series of timeit calls such as:

$ python -mtimeit -s'import dil; dil.make(10)' 'dil.with_in()'
10000 loops, best of 3: 28 usec per loop
$ python -mtimeit -s'import dil; dil.make(10)' 'dil.with_ex()'
10000 loops, best of 3: 41.7 usec per loop
$ python -mtimeit -s'import dil; dil.make(10)' 'dil.with_ge()'
10000 loops, best of 3: 46.6 usec per loop

this shows that, with 10% missing keys, the in check is substantially the fastest way.

$ python -mtimeit -s'import dil; dil.make(1)' 'dil.with_in()'
10000 loops, best of 3: 24.6 usec per loop
$ python -mtimeit -s'import dil; dil.make(1)' 'dil.with_ex()'
10000 loops, best of 3: 23.4 usec per loop
$ python -mtimeit -s'import dil; dil.make(1)' 'dil.with_ge()'
10000 loops, best of 3: 42.7 usec per loop

with just 1% missing keys, the exception approach is marginally fastest (and the get approach remains the slowest one in either case).

So, for optimal performance, unless the vast majority (99%+) of lookups is going to succeed, the in approach is preferable.

Of course, there's another, elegant possibility: adding a dict subclass like...:

class dd(dict):
   def __init__(self, *a, **k):
     dict.__init__(self, *a, **k)
   def __missing__(self, k):
     addit(self, k)
     return self[k]

def with_dd():
  dc = dd(d)
  for k in range(100):
    lc = dc[k]

However...:

$ python -mtimeit -s'import dil; dil.make(1)' 'dil.with_dd()'
10000 loops, best of 3: 46.1 usec per loop
$ python -mtimeit -s'import dil; dil.make(10)' 'dil.with_dd()'
10000 loops, best of 3: 55 usec per loop

...while slick indeed, this is not a performance winner -- it's about even with the get approach, or slower, just with much nicer-looking code to use it. (defaultdict, semantically analogous to this dd class, would be a performance win if it was applicable, but that's because the __missing__ special method, in that case, is implemented in well optimized C code).

Alex Martelli
Looking at defaultdict, it requires a callable as an argument. Could I provide a function I defined? I would need to pass an argument to the function, which doesn't seem possible with defaultdict. Otherwise, it would seem perfect.
Wilduck
@Wilduck, the fact that `defaultdict` takes a factory callable which must be callable (and gets called) **without arguments** is exactly what makes it unsuitable here. The `__missing__` special method itself (which `defaultdict` overrides) does get called with the missing key as the arg, but, as I've shown, overriding it in Python does not provide the best performance, unfortunately.
Alex Martelli
That's what I thought. Thanks for the confirmation and the in depth answer.
Wilduck
A: 

I believe the .get() method of a dict has a parameter for setting the default value. You could use that and have it in one line. I'm not sure how it affects performance though.

Daenyth