ansaurus

Question

removing pairs of elements from numpy arrays that are NaN (or another value) in Python

Answer 1

+1 A:

I think list comprehensions should do this. E.g.,

new_a = [(val1, val2) for (val1, val2) in a if math.isnan(val1) or math.isnan(val2)]

ig0774 2010-04-23 01:00:41

You'd actually need to make the test something like `val1 != val1` because `nan == nan` returns false. But +1.

David Zaslavsky 2010-04-23 01:03:00

I edited the above... honestly, I never have had need of `NaN`, so I was mostly addressing the form of building a list. Based on the documentation here: http://docs.python.org/library/math.html#math.isnan, I think this ought to work...

ig0774 2010-04-23 01:08:41

shouldn't it be:... in a where not math isnan(val1) and not math.isnan(val2)?

2010-04-23 01:22:20

You're been doing too much SQL. :) Use 'if' instead of 'where'.

Mark Dickinson 2010-04-23 08:38:13

Answer 2

+1 A:

You could convert the array into a masked array, and use the compress_rows method:

import numpy as np
a = np.array([[1, 5, np.nan, 6],
           [10, 6, 6, np.nan]])
a = np.transpose(a)
print(a)
# [[  1.  10.]
#  [  5.   6.]
#  [ NaN   6.]
#  [  6.  NaN]]
b=np.ma.compress_rows(np.ma.fix_invalid(a))
print(b)
# [[  1.  10.]
#  [  5.   6.]]

unutbu 2010-04-23 03:15:52

+1: wow! I don't often see masked arrays being used and suggested! Good one!

EOL 2010-04-23 09:19:56

Answer 3

+2 A:

Not to detract from ig0774's answer, which is perfectly valid and Pythonic and is in fact the normal way of doing these things in plain Python, but: numpy supports a boolean indexing system which could also do the job.

new_a = a[(a==a).all(1)]

I'm not sure offhand which way would be more efficient (or faster to execute).

If you wanted to use a different condition to select the rows, this would have to be changed, and precisely how depends on the condition. If it's something that can be evaluated for each array element independently, you could just replace the a==a with the appropriate test, for example to eliminate all rows with numbers larger than 100 you could do

new_a = a[(a<=100).all(1)]

But if you're trying to do something fancy that involves all the elements in a row (like eliminating all rows that sum to more than 100), it might be more complicated. If that's the case, I can try to edit in a more specific answer if you want to share your exact condition.

David Zaslavsky 2010-04-23 03:21:42

+1: for vectorized approach. this is almost always faster and is one the main reasons for using numpy.

tom10 2010-04-23 04:44:32

Answer 4

+5 A:

If you want to take only the rows that have no NANs, this is the expression you need:

>>> import numpy as np
>>> a[~np.isnan(a).any(1)]
array([[  1.,  10.],
       [  5.,   6.]])

If you want the rows that do not have a specific number among its elements, e.g. 5:

>>> a[~(a == 5).any(1)]
array([[  1.,  10.],
       [ NaN,   6.],
       [  6.,  NaN]])

The latter is clearly equivalent to

>>> a[(a != 5).all(1)]
array([[  1.,  10.],
       [ NaN,   6.],
       [  6.,  NaN]])

Explanation: Let's first create your example input

>>> import numpy as np
>>> a = np.array([[1, 5, np.nan, 6],
...               [10, 6, 6, np.nan]]).transpose()
>>> a
array([[  1.,  10.],
       [  5.,   6.],
       [ NaN,   6.],
       [  6.,  NaN]])

This determines which elements are NAN

>>> np.isnan(a)
array([[False, False],
       [False, False],
       [ True, False],
       [False,  True]], dtype=bool)

This identifies which rows have any element which are True

>>> np.isnan(a).any(1)
array([False, False,  True,  True], dtype=bool)

Since we don't want these, we negate the last expression:

>>> ~np.isnan(a).any(1)
array([ True,  True, False, False], dtype=bool)

And finally we use the boolean array to select the rows we want:

>>> a[~np.isnan(a).any(1)]
array([[  1.,  10.],
       [  5.,   6.]])

krawyoti 2010-04-23 14:06:25

+1: Super clear and helpful explanation, and I like ~np.isnan as it spells out what you are doing.

tom10 2010-04-23 18:20:14

ansaurus

tags:

views:

answers:

removing pairs of elements from numpy arrays that are NaN (or another value) in Python

related questions