tags:

views:

146

answers:

6

My question is: where do these patterns (below) originate?

I learned (somewhere) that Python has unique "copies", if that's the right word, for small integers. For example:

>>> x = y = 0
>>> id(0)
4297074752
>>> id(x)
4297074752
>>> id(y)
4297074752
>>> x += 1
>>> id(x)
4297074728
>>> y
0

When I look at the memory locations of ints, there is a simple pattern early:

>>> N = id(0)
>>> for i in range(5):
...     print i, N - id(i)
... 
0 0
1 24
2 48
3 72
4 96
>>> bin(24)
'0b11000'

It's not clear to me why this is chosen as the increment. Moreover, I can't explain this pattern at all above 256:

>>> prev = 0
>>> for i in range(270):
...     t = (id(i-1), id(i))
...     diff = t[0] - t[1]
...     if diff != prev:
...         print i-1, i, t, diff
...         prev = diff
... 
-1 0 (4297074776, 4297074752) 24
35 36 (4297073912, 4297075864) -1952
36 37 (4297075864, 4297075840) 24
76 77 (4297074904, 4297076856) -1952
77 78 (4297076856, 4297076832) 24
117 118 (4297075896, 4297077848) -1952
118 119 (4297077848, 4297077824) 24
158 159 (4297076888, 4297078840) -1952
159 160 (4297078840, 4297078816) 24
199 200 (4297077880, 4297079832) -1952
200 201 (4297079832, 4297079808) 24
240 241 (4297078872, 4297080824) -1952
241 242 (4297080824, 4297080800) 24
256 257 (4297080464, 4297155264) -74800
257 258 (4297155072, 4297155288) -216
259 260 (4297155072, 4297155336) -264
260 261 (4297155048, 4297155432) -384
261 262 (4297155024, 4297155456) -432
262 263 (4297380280, 4297155384) 224896
263 264 (4297155000, 4297155240) -240
264 265 (4297155072, 4297155216) -144
266 267 (4297155072, 4297155168) -96
267 268 (4297155024, 4297155144) -120

Any thoughts, clues, places to look?

Edit: and what's special about 24?

Update: The standard library has sys.getsizeof() which returns 24 when I call it with 1 as argument. That's a lot of bytes, but on a 64-bit machine, we have 8 bytes each for the type, the value and the ref count. Also, see here, and the C API reference here.

Spent some time with the "source" in the link from Peter Hansen in comments. Couldn't find the definition of an int (beyond a declaration of *int_int), but I did find:

#define NSMALLPOSINTS       257
#define NSMALLNEGINTS       5
A: 

In CPython, IDs are the address of the objects.

Brian
+2  A: 

The first 256 ints are preallocated

>>> for n in range(1000):
>>>  if id(n) != id(n+0):
>>>   print n
>>>   break

257
gnibbler
+6  A: 

Low-value integers are preallocated, high value integers are allocated whenever they are computed. Integers that appear in source code are the same object. On my system,

>>> id(2) == id(1+1)
True
>>> id(1000) == id(1000+0)
False
>>> id(1000) == id(1000)
True

You'll also notice that the ids depend on the system. They're just memory addresses, assigned by the system allocator (or possibly the linker, for static objects?)

>>> id(0)
8402324

Edit: The reason id(1000) == id(1000) is because the Python compiler notices that two of the integer constants in the code it's compiling are the same, so it only allocates one object for both. This would be an unacceptable performance hit at runtime, but at compile time it's not noticeable. (Yes, the interpreter is also a compiler. Most interpreters are also compilers, very few aren't.)

Dietrich Epp
Specifically (and only currently) it's values from -5 to 256. It might be a good idea to explain in the answer why the `id(1000)` case returns True. (It's one of those things that's only reasonably obvious once it's been explained to you.)
Peter Hansen
Thanks, added explanation.
Dietrich Epp
+1  A: 

integers up to 256 are interned, that's why you see a "pattern" there. All other integers are created during execution and as such allocated random id.

SilentGhost
OK. "random" or whatever, above 256.
telliott99
+2  A: 

I seem to recall that Python internally caches copies of ints < 256 to save having to create new Python objects for commonly-used cases. So once you go over 256 you're getting newly created objects each time which may appear to be 'randomly' laid out in memory (obviously their locations would make sense to Python's allocator, but probably not to us).

Peter
+2  A: 

Since others have fully answered why ids have a distinct pattern up to 256, I thought I would answer your addendum instead: 24 is the size in bytes of an integer object in python. When the first 256 integers are allocated they are done so contiguously in memory, leading to a difference of 24 between each memory address.

Clueless