tags:

views:

56

answers:

2

Is there any advantage to using numpy when you're doing a large number of operations on lists of binary values? How about integers within a small range (like just the numbers 1,2, and 3?)

+1  A: 

If the number of input values is huge, or if you are doing a lot of operations, you might want to try bitarray. Or, see the bool/int8/uint8 dtype in Numpy's ndarray:

In [1]: import numpy as np
In [2]: data = np.array([0,1,1,0], dtype=bool)
In [3]: data
Out[3]: array([False,  True,  True, False], dtype=bool)
In [4]: data.size
Out[4]: 4
In [5]: data.nbytes
Out[5]: 4
Alok
I have found bitarray to be kind of slow sometimes.
Justin Peel
It could be - I have only heard and read about it, but from the description, it seems like it should be fast. Is it slower than Python lists?
Alok
+3  A: 

Eliminating the loops is the the source of the performance gain (10x):

import profile
import numpy as NP

def np_test(a2darray) :
  row_sums = NP.sum(a2darray, axis=1)
  return NP.sum(row_sums)

def stdlib_test2(a2dlist) :
  return sum([sum(row) for row in a2dlist])

A = NP.random.randint(1, 6, 1e7).reshape(1e4, 1e3)
B = NP.ndarray.tolist(A)

profile.run("np_test(A)")
profile.run("stdlib_test2(B)")

numpy:

  • 10 function calls in 0.025 CPU seconds

lists:

  • 10005 function calls in 0.280 CPU seconds
doug