views:

181

answers:

5

I stumbled upon the following python weirdity:

>>> two = 2
>>> ii = 2

>>> id(two) == id(ii)
True
>>> [id(i) for i in [42,42,42,42]]
[10084276, 10084276, 10084276, 10084276]

>>> help(id)
Help on built-in function id in module __builtin__:

id(...)
    id(object) -> integer

    Return the identity of an object.  This is guaranteed to be unique among
    simultaneously existing objects.  (Hint: it's the object's memory address.)

i have the following incredulous questions:

  1. is every number a unique object?
  2. are different variables holding the same elemental values (e.g two,ii) the same object?
  3. how is the id of a number generated by python
  4. in the above example, are two and ii pointers to a memory cell holding the value 2? that would be extremely weird

help me untangle this identity crisis

some more weirdities:

>>> a,b=id(0),id(1)
>>> for i in range(2,1000):
   a,b=b,id(i)
   if abs(a-b) != 12:
    print('%i:%i -> %i' % (i,a,b))

above code examines if ids of consecutive integers are also consecutive, and prints out anomalies:

77:10083868 -> 10085840
159:10084868 -> 10086840
241:10085868 -> 10087840
257:10087660 -> 11689620
258:11689620 -> 11689512
259:11689512 -> 11689692
260:11689692 -> 11689548
261:11689548 -> 11689644
262:11689644 -> 11689572
263:11689572 -> 11689536
264:11689536 -> 11689560
265:11689560 -> 11689596
266:11689596 -> 11689656
267:11689656 -> 11689608
268:11689608 -> 11689500
331:11688756 -> 13807288
413:13806316 -> 13814224
495:13813252 -> 13815224
577:13814252 -> 13816224
659:13815252 -> 13817224
741:13816252 -> 13818224
823:13817252 -> 13819224
905:13818252 -> 13820224
987:13819252 -> 13821224

note that a pattern emerges from 413 onwards. maybe its due to some voodoo accounting at the beginning of each new memory page. any clues?

+9  A: 

Integers between -1 and 255(?), as well as string literals, are interned. Each instance in the source actually represents the same object.

In CPython, the result of id() is the address in the process space of the PyObject.

Ignacio Vazquez-Abrams
This is only necessarily true for CPython. And even there, no-one has guaranteed that won't change, even during the syntax freeze.
jcdyer
True. It used to only be up to 99 or so in older versions.
Ignacio Vazquez-Abrams
+1  A: 

You should be very careful with these sorts of investigations. You are looking into the internals of the implementation of the language, and those are not guaranteed. The help on id is spot-on: the number will be different for two different objects, and the same for the same object. As an implementation detail, in CPython it is the memory address of the object. CPython might decide to change this detail at any time.

The detail of small integers being interned to same allocation time is also a detail that could change at any time.

Also, if you switch from CPython to Jython, or PyPy, or IronPython, all bets are off, other than the documentation on id().

Ned Batchelder
+1  A: 

Not every number is a unique object, and the fact that some are is an optimization detail of the CPython interpreter. Do not rely on this behavior. For that matter, never use is to test for equality. Only use is if you are absolutely sure you need the exact same object.

jcdyer
Caveat: You should always use `is` when comparing to `None`.
jcdyer
And `Ellipsis`. Although you'll never have to do that.
Ignacio Vazquez-Abrams
The general rule, straight out of PEP-8, is "use is when comparing to a singleton object". That means None, True, False, Ellipsis, etc. However, True and False are generally best omitted altogether, to allow boolean coercion to take place.
Tim Lesher
+2  A: 

Your fourth question, "in the above example, are two and ii pointers to a memory cell holding the value 2? that would be extremely weird", is really the key to understanding the whole thing.

If you're familiar with languages like C, Python "variables" don't really work the same way. A C variable declaration like:

int j=1;
int k=2;
k += j;

says, "compiler, reserve for me two areas of memory, on the stack, each with enough space to hold an integer, and remember one as 'j' and the other as 'k'. Then fill j with the value '1' and k with the value '2'." At runtime, the code says "take the integer contents of k, add the integer contents of j, and store the result back to k."

The seemingly equivalent code in Python:

j = 1
k = 2
k += j

says something different: "Python, look up the object known as '1', and create a label called 'j' that points to it. Look up the object known as '2', and create a label called 'k' that points to it. Now look up the object 'k' points to ('2'), look up the object 'j' points to ('1'), and point 'k' to the object resulting from performing the 'add' operation on the two."

Disassembling this code (with the dis module) shows this nicely:

  2           0 LOAD_CONST               1 (1)
              3 STORE_FAST               0 (j)

  3           6 LOAD_CONST               1 (2)
              9 STORE_FAST               1 (k)

  4          12 LOAD_FAST                1 (k)
             15 LOAD_FAST                0 (j)
             18 INPLACE_ADD
             19 STORE_FAST               1 (k)

So yes, Python "variables" are labels that point to objects, rather than containers that can be filled with data.

The other three questions are all variations on "when does Python create a new object from a piece of code, and when does it reuse one it already has?". The latter is called "interning"; it happens to smaller integers and strings that look (to Python) like they might be symbol names.

Tim Lesher
"object known as '2', and create a label called 'j'" that should be '1'
tolomea
Thanks, tolomea--fixed!
Tim Lesher
+8  A: 

Every implementation of Python is fully allowed to optimize to any extent (including.... none at all;-) the identity and allocation of immutable objects (such as numbers, tuples and strings) [[no such latitude exists for mutable objects, such as lists, dicts and sets]].

Between two immutable object references a and b, all the implementation must guarantee is:

  1. id(a) == id(b), AKA a is b, must always imply a == b
  2. and therefore a != b must always imply id(a) != id(b) AKA a is not b

Note in particular there is no constraint, even for immutable types, that a == b must imply a is b (i.e. that id(a) == id(b)). Only None makes that guarantee (so you can always test if x is None: rather than if x == None:).

Current CPython implementations take advantage of these degrees of freedom by "merging" (having a single allocation, thus a single id, for) small integers in a certain range, and built-in immutable-type objects whose literals appear more than once within a given function (so for example if your function f has four occurrences of literal 'foobar' they will all refer to a single instance of string 'foobar' within the function's constants, saving a little space compared to the permissible implementation that would store four identical but separate copies of that constant).

All of these implementation considerations are of pretty minor interest to Python coders (unless you're working on a Python implementation, or at least something that's tightly bound to a specific implementation, such as a debugging system).

Alex Martelli