views:

62

answers:

2

I have an array that might look like this:

ANOVAInputMatrixValuesArray = [[ 0.96488889, 0.73641667, 0.67521429, 0.592875, 
0.53172222], [ 0.78008333, 0.5938125, 0.481, 0.39883333, 0.]]

Notice that one of the rows has a zero value at the end. I want to delete any row that contains a zero, while keeping any row that contains non-zero values in all cells.

But the array will have different numbers of rows every time it is populated, and the zeros will be located in different rows each time.

I get the number of non-zero elements in each row with the following line of code:

NumNonzeroElementsInRows    = (ANOVAInputMatrixValuesArray != 0).sum(1)

For the array above, NumNonzeroElementsInRows contains: [5 4]

The five indicates that all possible values in row 0 are nonzero, while the four indicates that one of the possible values in row 1 is a zero.

Therefore, I am trying to use the following lines of code to find and delete rows that contain non-zero values.

for q in range(len(NumNonzeroElementsInRows)):
    if NumNonzeroElementsInRows[q] < NumNonzeroElementsInRows.max():
        p.delete(ANOVAInputMatrixValuesArray, q, axis=0)

But for some reason, this code does not seem to do anything, even though doing a lot of print commands indicates that all of the variables seem to be populating correctly leading up to the code.

There must be some easy way to simply "delete any row that contains a nonzero value."

Can anyone show me what code to write to accomplish this?

A: 

This is similar to your original approach, and will use less space than unutbu's answer, but I suspect it will be slower.

>>> import numpy as np
>>> p = np.array([[1.5, 0], [1.4,1.5], [1.6, 0], [1.7, 1.8]])
>>> p
array([[ 1.5,  0. ],
       [ 1.4,  1.5],
       [ 1.6,  0. ],
       [ 1.7,  1.8]])
>>> nz = (p == 0).sum(1)
>>> q = p[nz == 0, :]
>>> q
array([[ 1.4,  1.5],
       [ 1.7,  1.8]])

By the way, your line p.delete() doesn't work for me - ndarrays don't have a .delete attribute.

mtrw
a bit simpler: p[~(p==0).any(1)] or more explicit for rows: p[~(p==0).any(1), :]
@user333700 - That should be an answer - you'd definitely get my vote! `any` is much more clear than `sum` in this use.
mtrw
+1  A: 

Here's a one liner (yes, it is similar to user333700's, but a little more straightforward):

>>> import numpy as np
>>> arr = np.array([[ 0.96488889, 0.73641667, 0.67521429, 0.592875, 0.53172222], 
                [ 0.78008333, 0.5938125, 0.481, 0.39883333, 0.]])
>>> print arr[arr.all(1)]
array([[ 0.96488889,  0.73641667,  0.67521429,  0.592875  ,  0.53172222]])

By the way, this method is much, much faster than the masked array method for large matrices. For a 2048 x 5 matrix, this method is about 1000x faster.

By the way, user333700's method (from his comment) was slightly faster in my tests, though it boggles my mind why.

Justin Peel
"any" can short-circuit, as soon as the first true case is detected, it can stop, while "all" has to check all conditions. So, not ("~" in numpy) any, should in general be faster than all.
@user333700, both of them can short-circuit, just to different things. `any` short-circuits to true at the first true case detected; `all` short-circuits to false at the first false case detected. In this case, the short-circuiting should be a draw, but doing the extra not should make it slower in my opinion.
Justin Peel