tags:

views:

520

answers:

3

I'm learning to use Numpy and I wanted to see the speed difference in the summation of a list of numbers so I made this code:

np_array = numpy.arange(1000000)
start = time.time()
sum_ = np_array.sum()
print time.time() - start, sum_

>>> 0.0 1783293664

python_list = range(1000000)
start = time.time()
sum_ = sum(python_list)
print time.time() - start, sum_

>>> 0.390000104904 499999500000

The python_list sum is correct.

If I do the same code with the summation to 1000, both print the right answer. Is there an upper limit to the length of the Numpy array or is it with the Numpy sum function?

Thanks for your help

+2  A: 

The standard list switched over to doing arithmetic with the long type when numbers got larger than a 32-bit int.

The numpy array did not switch to long, and suffered from integer overflow. The price for speed is smaller range of values allowed.

>>> 499999500000 % 2**32
1783293664L
S.Lott
+2  A: 

Numpy is creating an array of 32-bit unsigned ints. When it sums them, it sums them into a 32-bit value.

if 499999500000L % (2**32) == 1783293664L:
    print "Overflowed a 32-bit integer"

You can explicitly choose the data type at array creation time:

a = numpy.arange(1000000, dtype=numpy.uint64)
a.sum() -> 499999500000
Joe Koberg
+3  A: 

Notice that 499999500000 % 2**32 equals exactly 1783293664 ... i.e., numpy is doing operations modulo 2**32, because that's the type of the numpy.array you've told it to use.

Make np_array = numpy.arange(1000000, dtype=numpy.uint64), for example, and your sum will come out OK (although of course there are still limits, with any finite-size number type).

You can use dtype=numpy.object to tell numpy that the array holds generic Python objects; of course, performance will decay as generality increases.

Alex Martelli