views:

385

answers:

4

Possible Duplicate:
'has_key()' or 'in'?

I have a Python dictionary like :

mydict = {'name':'abc','city':'xyz','country','def'}

I want to check if a key is in dictionary or not. I am eager to know that which is more preferable from the following two cases and why?

1> if mydict.has_key('name'):
2> if 'name' in mydict:
+20  A: 
if 'name' in mydict:

is the preferred, pythonic version. Use of has_key() is discouraged, and this method has been removed in Python 3.

Tim Pietzcker
Also, `"name" in dict` will work with any iterable and not just dictionaries.
Noufal Ibrahim
What about `dict.get(key)`? That should also be avoided?
sukhbir
@PulpFiction: `dict.get(key)` can be useful when you (1) don't wan't a `KeyError` in case `key` is not in dict (2) want to use a default value if there is no `key` (`dict.get(key, default)`). Point #2 can be done using `defaultdict` as well.
Manoj Govindan
@Manoj: Aah thanks.
sukhbir
dict.get returns the value. It does not (and cannot) reliably tell you if the key is in the dictionary. It's an entirely different purpose.
Joe
@Joe, 1) It can reliably tell you that, but using it for just that is of course silly, and 2) Manoj is addressing the issue at a *higher level*. You usually have a reason for checking if a key is in a dict, and the reasons you have are very often handled more smoothly by `get`, `setdefault`, and `defaultdict`.
Mike Graham
+8  A: 

In terms of bytecode, in saves a LOAD_ATTR and replaces a CALL_FUNCTION with a COMPARE_OP.

>>> dis.dis(indict)
  2           0 LOAD_GLOBAL              0 (name)
              3 LOAD_GLOBAL              1 (d)
              6 COMPARE_OP               6 (in)
              9 POP_TOP             


>>> dis.dis(haskey)
  2           0 LOAD_GLOBAL              0 (d)
              3 LOAD_ATTR                1 (haskey)
              6 LOAD_GLOBAL              2 (name)
              9 CALL_FUNCTION            1
             12 POP_TOP             

My feelings are that in is much more readable and is to be preferred in every case that I can think of.

In terms of performance, the timing reflects the opcode

$ python -mtimeit -s'd = dict((i, i) for i in range(10000))' "'foo' in d"
 10000000 loops, best of 3: 0.11 usec per loop

$ python -mtimeit -s'd = dict((i, i) for i in range(10000))' "d.has_key('foo')"
  1000000 loops, best of 3: 0.205 usec per loop

in is almost twice as fast.

aaronasterling
Any speed measures are of course problem specific, usually irrelevant, implementation-dependent, potentially version-dependent, and less important than deprecation and style issues.
Mike Graham
@Mike Graham, you're mostly right. I did stick worse case in there thought because, IMO, that's where you really want to know. Also, I think that your attitude is (while still absolutely correct), slightly more appropriate to a language like C where it's fast either way unless you really mess something up. In Python it pays to get it right to a greater degree. Also, the core devs have a way of tuning "the one right way" to do something so that, again, performance is a good indicator of good style to a greater extent than normal in a language.
aaronasterling
+4  A: 

My answer is "neither one".

I believe the most "Pythonic" way to do things is to NOT check beforehand if the key is in a dictionary and instead just write code that assumes it's there and catch any KeyErrors that get raised because it wasn't.

This is usually done with enclosing the code in a try...except clause and is a well-known idiom usually expressed as "It's easier to ask forgiveness than permission" or with the acronym EAFP, which basically means it is better to try something and catch the errors instead for making sure everything's OK before doing anything. Why validate what doesn't need to be validated when you can handle exceptions gracefully instead of trying to avoid them? Because it's often more readable and the code tends to be faster if the probability is low that the key won't be there (or whatever preconditions there may be).

Of course, this isn't appropriate in all situations and not everyone agrees with the philosophy, so you'll need to decide for yourself on a case-by-case basis. Not surprisingly the opposite of this is called LBYL for "Look Before You Leap".

As a trivial example consider:

if 'name' in dct:
    value = dct['name'] * 3
else:
    logerror('"%s" not found in dictionary, using default' % name)
    value = 42

vs

try:
    value = dct['name'] * 3
except KeyError:
    logerror('"%s" not found in dictionary, using default' % name)
    value = 42

Although in the case it's almost exactly the same amount of code, the second doesn't spend time checking first and is probably slightly faster because of it (try...except block isn't totally free though, so it probably doesn't make that much difference here).

Generally speaking, testing in advance can often be much more involved and the savings gain from not doing it can be significant. That said, if 'name' in dict: is better for the reasons stated in the other answers.

If you're interested in the topic, this message titled "EAFP vs LBYL (was Re: A little disappointed so far)" from the Python mailing list archive probably explains the difference between the two approached better than I have here. There's also a good discussion about the two approaches in the book Python in a Nutshell, 2nd Ed by Alex Martelli in the chapter on exceptions titled "Error-Checking Strategies".

martineau
Is there data to support the statement "the savings gain from not doing it can be significant"? As a Java developer, I'm used to thinking that exceptions are expensive and should be for truly exceptional situations. Your recommendation sounds like "exception as goto". Can you cite a source?
duffymo
@duffymo. No I can't cite a source. I made the statement because there are usually many ways something can go wrong verses a relatively small number of right ways. Checking for all the wrong ways can involve a lot of code (which is often also tedious to write). Handling exceptions can be slow in some languages and I specifically mentioned that this might not be a good way to go if you expect them to happen frequently. I am also not advocating using exceptions as part of the normal or regular control flow of a program -- they should be used, as you put it, for exceptional circumstances.
martineau
Exceptions in Python are expensive. If you expect the key to be missing more than a few percent of the time, the exception cost will probably dominate the runtime of the function.
Joe
@duffymo, @Joe: See http://stackoverflow.com/questions/2522005/cost-of-exception-handlers-in-python - exceptions are faster than `if` statements, and they are considered more pythonic (the Python philosophy is "it's easier to ask forgiveness than permission" instead of "look before you leap"). And +1 for this very relevant answer.
Tim Pietzcker
@Tim - Thank you. I appreciate the explanation, because I'm in the process of learning Python, too.
duffymo
@duffymo, The prevailing style in Python is to use exceptions. This creates more idiomatic, readable code. Generally speaking, a succeeding `try` block is pretty cheap but if an exception is raised it's more expensive, but *this isn't what dictates the design of 95% of the code you write*.
Mike Graham
@Tim: Did you miss where I said "If you expect the key to be missing more than a few percent of the time"? Exceptions are only about as fast as if statements if they don't happen - if they do happen, your link shows them 2x slower for zero division and my quick timeit shows them _10x_ slower for dict lookups. Screw "Pythonic", I'll take idioms that runs 10x faster.
Joe
@Joe. If you expect something to happen relatively frequently, it's faster to check for it in advance than use exceptions which are slower to handle when they occur. Your code may be more complicated with extra checking, but that's the tradeoff. Exceptions happening should not be the 'normal' program flow and generally are for things that aren't expected to happen often (they're exceptional).
martineau
+5  A: 

In the same vein as martineau's response, the best solution is often not to check. For example, the code

if x in d:
    foo = d[x]
else:
    foo = bar

is normally written

foo = d.get(x, bar)

which is shorter and more directly speaks to what you mean.

Another common case is something like

if x not in d:
    d[x] = []

d[x].append(foo)

which can be rewritten

d.setdefault(x, []).append(foo)

or rewritten even better by using a collections.defaultdict(list) for d and writing

d[x].append(foo)
Mike Graham
Yes, something you could call "intelligent defaults" (or even "intelligent design" ;-)
martineau
Naw, these are methods and types Python evolved over time. `;)`
Mike Graham
OMG, you're right!
martineau
@martineau Generally a design process is incapable of initially producing an good solution. It is not until a solution hit's the real word that it can be improved. generally, given enough random mutations, one of them will be superior to the original design.
aaronasterling
@AaronMcSmooth Since we're not talking about nature here, I would hope the mutations weren't entirely random. As Frederick Brooks famously said in his 1975 book *The Mythical Man-Month*, **"plan to throw one away; you will, anyhow"**. The real downside, IMHO, is that often you can't really afford to do that and end up having to be backwards compatible due to numerous dependencies that arose while the evolution was taking place. That's why the best designs are often those that reduce dependencies.
martineau