views:

3693

answers:

6

Why does the following behave unexpectedly in Python?

>>> a = 256
>>> b = 256
>>> a is b
True           # this is an expected result
>>> a = 257
>>> b = 257
>>> a is b
False          # what happened here? why is this False?
>>> 257 is 257
True           # yet the literal numbers compare properly

I am using Python 2.5.2. Trying some different versions of Python, it appears that a Python 2.3.3 shows the above behaviour between 99 and 100.

Based on the above, I can hypothesise that Python is internally implemented such that "small" integers are stored in a different way than larger integers, and the is operator can tell the difference. Why the leaky abstraction? What is a better way of comparing two arbitrary objects to see whether they are the same, and I don't know in advance whether they are numbers or not?

+7  A: 

I think your hypotheses is correct. Experiment with id (identity of object)..

In [1]: id(255)
Out[1]: 146349024

In [2]: id(255)
Out[2]: 146349024

In [3]: id(257)
Out[3]: 146802752

In [4]: id(257)
Out[4]: 148993740

In [5]: a=255

In [6]: b=255

In [7]: c=257

In [8]: d=257

In [9]: id(a), id(b), id(c), id(d)
Out[9]: (146349024, 146349024, 146783024, 146804020)

It appears that numbers <= 255 are treated as literals and anything above is treated differently!

Amit
Aack, posted just a moment before mine was. Good analysis though, and I think you could be correct.
Cybis
+41  A: 

Take a look at this:

>>> a = 256
>>> b = 256
>>> id(a)
9987148
>>> id(b)
9987148
>>> a = 257
>>> b = 257
>>> id(a)
11662816
>>> id(b)
11662828

EDIT: Here's what I found in the Python documentation, 7.2.1, "Plain Integer Objects":

The current implementation keeps an array of integer objects for all integers between -5 and 256, when you create an int in that range you actually just get back a reference to the existing object. So it should be possible to change the value of 1. I suspect the behaviour of Python in this case is undefined. :-)

Cybis
The id function is the hash function that is used for dictionaries and such and doesn't have any relation to the "is" operator. The "is" operator is to detect multiple references (aliases) to the same object, not equality or hash equality. For instance for a = b = 20007, a is b _should_ be true. The fact that it works on separate assignments to small integers is just an implementation detail.
Adam K. Johnson
+6  A: 

The 'is' operator is not another way to type '=='.

thereisnospork
That's true, but I really do want to compare object identity.
Greg Hewgill
+11  A: 

It depends on whether you're looking to see if 2 things are equal, or the same object.

"is" checks to see if they are the same object, not just equal. The small ints are probably pointing to the same memory location for space efficiency

In [29]: a = 3
In [30]: b = 3
In [31]: id(a)
Out[31]: 500729144
In [32]: id(b)
Out[32]: 500729144

You should use "==" to compare equality of arbitrary objects. You can specify the behavior with the __eq__, and __ne__ attributes.

JimB
My goal is to compare object identity, rather than equality of value. Except for numbers, where I want to treat object identity the same as equality of value.
Greg Hewgill
In that case I suggest that you build a custom comparison function that checks the type of the operands and uses is or == appropriately.
David Locke
+11  A: 

As you can check here Python caches small ints for eficiency. Everytime you create a reference to a small int, you are referring the cached small int, not a new object. 257 is not an small int, so it is calculated as a different object.

It is better to use "==" for that purpose.

Angel
+5  A: 

For immutable value objects, like ints, strings or datetimes, object identity is not especially useful. It's better to think about equality. Identity is essentially an implementation detail for value objects - since they're immutable, there's no effective difference between having multiple refs to the same object or multiple objects.

wilberforce