views:

136

answers:

3

Hi all,

I am just getting to know numpy, and I am impressed by its claims of C-like efficiency with memory access in its ndarrays. I wanted to see the differences between these and pythonic lists for myself, so I ran a quick timing test, performing a few of the same simple tasks with numpy without it. Numpy outclassed regular lists by an order of magnitude in the allocation of and arithmetic operations on arrays, as expected. But this segment of code, identical in both tests, took about 1/8 of a second with a regular list, and slightly over 2.5 seconds with numpy:

file = open('timing.log','w')
for num in a2:
    if num % 1000 == 0:
        file.write("Multiple of 1000!\r\n")

file.close()

Does anyone know why this might be, and if there is some other syntax i should be using for operations like this to take better advantage of what the ndarray can do?

Thanks...

EDIT: To answer Wayne's comment... I timed them both repeatedly and in different orders and got pretty much identical results each time, so I doubt it's another process. I put

start = time()
at the top of the file after the numpy import and then I have statements like
print 'Time after traversal:\t',(time() - start)
throughout.

+7  A: 

a2 is a NumPy array, right? One possible reason it might be taking so long in NumPy (if other processes' activity don't account for it as Wayne Werner suggested) is that you're iterating over the array using a Python loop. At every step of the iteration, Python has to fetch a single value out of the NumPy array and convert it to a Python integer, which is not a particularly fast operation.

NumPy works much better when you are able to perform operations on the whole array as a unit. In your case, one option (maybe not even the fastest) would be

file.write("Multiple of 1000!\r\n" * (a2 % 1000 == 0).sum())

Try comparing that to the pure-Python equivalent,

file.write("Multiple of 1000!\r\n" * sum(filter(lambda i: i % 1000 == 0, a2)))

or

file.write("Multiple of 1000!\r\n" * sum(1 for i in a2 if i % 1000 == 0))
David Zaslavsky
+4  A: 

I'm not surprised that NumPy does poorly w/r/t Python built-ins when using your snippet. A large fraction of the performance benefit in NumPy arises from avoiding the loops and instead access the array by indexing:

In NumPy, it's more common to do something like this:

A = NP.random.randint(10, 100, 100).reshape(10, 10)
w = A[A % 2 == 0]
NP.save("test_file.npy", w)
doug
+1 for A[A%2==0] which is the type of line that the OP would want to use, except with 1000 instead of 2, of course.
tom10
+2  A: 

Per-element access is very slow for numpy arrays. Use vector operations:

$ python -mtimeit -s 'import numpy as np; a2=np.arange(10**6)' '
>    sum(1 for i in a2 if i % 1000 == 0)'
10 loops, best of 3: 1.53 sec per loop

$ python -mtimeit -s 'import numpy as np; a2=np.arange(10**6)' '
>    (a2 % 1000 == 0).sum()'
10 loops, best of 3: 22.6 msec per loop

$ python -mtimeit -s 'import numpy as np; a2=    range(10**6)' '
>    sum(1 for i in a2 if i % 1000 == 0)'
10 loops, best of 3: 90.9 msec per loop
J.F. Sebastian