tags:

views:

291

answers:

3

So, cPython (2.4) has some interesting behaviour when the length of something gets near to 1<<32 (the size of an int).

r = xrange(1<<30)
assert len(r) == 1<<30

is fine, but:

r = xrange(1<<32)
assert len(r) == 1<<32
ValueError: xrange object size cannot be reported`__len__() should return 0 <= outcome

Alex's wowrange has this behaviour as well. wowrange(1<<32).l is fine, but len(wowrange(1<<32)) is bad. I'm guessing there is some floating point behaviour (being read as negative) action going on here.

  1. What exactly is happening here? (this is pretty well-solved below!)
  2. How can I get around it? Longs?

(My specific application is random.sample(xrange(1<<32),ABUNCH)) if people want to tackle that question directly!)

+1  A: 

1<<32, when treated as a signed integer, is negative.

Anon.
+7  A: 

cPython assumes that lists fit in memory. This extends to objects that behave like lists, such as xrange. essentially, the len function expects the __len__ method to return something that is convertable to size_t, which won't happen if the number of logical elements is too large, even if those elements don't actually exist in memory.

TokenMacGuy
thank you for explaining why `len` in particular is behaving this way. cPython len expects `size_t`.
Gregg Lind
Minor quibble: just because a length is too big for a `size_t` doesn't mean that the object doesn't fit in memory. For example I have a class representing a bit field for which `__len__` stops working for objects over 256MB in 32-bit Python.
Scott Griffiths
+4  A: 

You'll find that

xrange(1 << 31 - 1)

is the last one that behaves as you want. This is because the maximum signed (32-bit) integer is 2^31 - 1.

1 << 32 is not a positive signed 32-bit integer (Python's int datatype), so that's why you're getting that error.

In Python 2.6, I can't even do xrange(1 << 32) or xrange(1 << 31) without getting an error, much less len on the result.

Edit If you want a little more detail...

1 << 31 represents the number 0x80000000 which in 2's complement representation is the lowest representable negative number (-1 * 2^31) for a 32-bit int. So yes, due to the bit-wise representation of the numbers you're working with, it's actually becoming negative.

For a 32-bit 2's complement number, 0x7FFFFFFF is the highest representable integer (2^31 - 1) before you "overflow" into negative numbers.

Further reading, if you're interested.

Note that when you see something like 2147483648L in the prompt, the "L" at the end signifies that it's now being represented as a "long integer" (64 bits, usually, I can't make any promises on how Python handles it because I haven't read up on it).

Sapph
Try `hex(1 << 32)` and see what you get. Hint: it's not 0x80000000.
Mark Byers
Typo, sorry, fixed.
Sapph