ansaurus

Question

How do I maximize efficiency with numpy arrays?

Answer 1

+7 A:

a2 is a NumPy array, right? One possible reason it might be taking so long in NumPy (if other processes' activity don't account for it as Wayne Werner suggested) is that you're iterating over the array using a Python loop. At every step of the iteration, Python has to fetch a single value out of the NumPy array and convert it to a Python integer, which is not a particularly fast operation.

NumPy works much better when you are able to perform operations on the whole array as a unit. In your case, one option (maybe not even the fastest) would be

file.write("Multiple of 1000!\r\n" * (a2 % 1000 == 0).sum())

Try comparing that to the pure-Python equivalent,

file.write("Multiple of 1000!\r\n" * sum(filter(lambda i: i % 1000 == 0, a2)))

or

file.write("Multiple of 1000!\r\n" * sum(1 for i in a2 if i % 1000 == 0))

David Zaslavsky 2010-08-03 18:24:25

Answer 2

+4 A:

I'm not surprised that NumPy does poorly w/r/t Python built-ins when using your snippet. A large fraction of the performance benefit in NumPy arises from avoiding the loops and instead access the array by indexing:

In NumPy, it's more common to do something like this:

A = NP.random.randint(10, 100, 100).reshape(10, 10)
w = A[A % 2 == 0]
NP.save("test_file.npy", w)

doug 2010-08-03 18:36:05

+1 for A[A%2==0] which is the type of line that the OP would want to use, except with 1000 instead of 2, of course.

tom10 2010-08-03 19:08:47

Answer 3

+2 A:

Per-element access is very slow for numpy arrays. Use vector operations:

$ python -mtimeit -s 'import numpy as np; a2=np.arange(10**6)' '
>    sum(1 for i in a2 if i % 1000 == 0)'
10 loops, best of 3: 1.53 sec per loop

$ python -mtimeit -s 'import numpy as np; a2=np.arange(10**6)' '
>    (a2 % 1000 == 0).sum()'
10 loops, best of 3: 22.6 msec per loop

$ python -mtimeit -s 'import numpy as np; a2=    range(10**6)' '
>    sum(1 for i in a2 if i % 1000 == 0)'
10 loops, best of 3: 90.9 msec per loop

J.F. Sebastian 2010-08-03 19:16:04

ansaurus

tags:

views:

answers:

How do I maximize efficiency with numpy arrays?

related questions