ansaurus

Question

Answer 1

A:

Create a mask and use the compress function of the numpy array. It should be much faster. If you have a complex criteria, remember to construct it based on math of the arrays.

a = numpy.array([3,1,2,4,5])
mask = a > 3
b = a.compress(mask)

or

a = numpy.random.random_integers(1,5,100000)
c=a.compress((a<=4)*(a>=2)) ## numbers between n<=4 and n>=2
d=a.compress(~((a<=4)*(a>=2))) ## numbers either n>4 or n<2

Ok, if you want a mask that has all a in [1,3,5] you can do something like

a = numpy.random.random_integers(1,5,100000)
mask=(a==1)+(a==3)+(a==5)

or

a = numpy.random.random_integers(1,5,100000)
mask = numpy.zeros(len(a), dtype=bool)
for num in [1,3,5]:
    mask += (a==num)

jimbob 2010-10-21 17:07:48

I don't think that this is what I'm looking for. I don't want to get the actual contents of the array back, I just want to get a boolean mask that has the same length as the original array.

aduric 2010-10-21 17:28:24

Ok, edited it now that I know what you want. I guess Jouni's solution that he came up with while I was editing mine was equivalent, as True= True + True, True = True + False, False = False + False, exactly the same as or using |.

jimbob 2010-10-21 18:15:22

Answer 2

+3 A:

Combine several comparisons with "or":

A = randint(10,size=10000)
mask = (A == 1) | (A == 3) | (A == 5)

Or if you have a list B and want to create the mask dynamically:

B = [1, 3, 5]
mask = zeros((10000,),dtype=bool)
for t in B: mask = mask | (A == t)

Jouni K. Seppänen 2010-10-21 18:07:43

@Jouni - just wondering why or how to anticipate when `numpy` will naturally do this `ufunc` enabled element-wise logical operation? When doing logical operations `numpy` sometimes throws back an exception: `ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all().`

dtlussier 2010-10-21 18:35:16

@Jouni this is certainly the fastest approach, albeit, not the cleanest one.

aduric 2010-10-21 20:41:51

Answer 3

+3 A:

I think that the numpy function in1d is what you are looking for:

>>> A = numpy.array([1,2,3,4,5])
>>> B = [1,3,5]
>>> numpy.in1d(A,crit)
array([ True, False,  True, False,  True], dtype=bool)

as stated in its docstring, "in1d(a, b) is roughly equivalent to np.array([item in b for item in a])"

Admittedly, I haven't done any speed tests, but it sounds like what you are looking for.

Another faster way

Here's another way to do it which is faster. Sort the B array first(containing the elements you are looking to find in A), turn it into a numpy array, and then do:

B[B.searchsorted(A)] == A

though if you have elements in A that are larger than the largest in B, you will need to do:

inds = B.searchsorted(A)
inds[inds == len(B)] = 0
mask = B[inds] == A

It may not be faster for small arrays (especially for B being small), but before long it will definitely be faster. Why? Because this is a O(N log M) algorithm, where N is the number of elements in A and M is the number of elements in M, putting together a bunch of individual masks is O(N * M). I tested it with N = 10000 and M = 14 and it was already faster. Anyway, just thought that you might like to know, especially if you are truly planning on using this on very large arrays.

Justin Peel 2010-10-21 19:05:49

looks like a recent addition to numpy (wasn't in version 1.3)

bpowah 2010-10-21 19:36:00

You are right. I only tested on B having a length of 3. If B is also large, numpy.in1d() definitely scales a lot better.

aduric 2010-10-22 14:07:03

@aduric and my second method is even faster than in1d.

Justin Peel 2010-10-22 15:46:56

ansaurus

tags:

views:

answers:

Numpy.array indexing question

related questions