tags:

views:

578

answers:

2

I would like to delete selected columns in a numpy.array . This is what I do:

n [397]: a = array([[ NaN,   2.,   3., NaN],
   .....:        [  1.,   2.,   3., 9]])

In [398]: print a
[[ NaN   2.   3.  NaN]
 [  1.   2.   3.   9.]]

In [399]: z = any(isnan(a), axis=0)

In [400]: print z
[ True False False  True]

In [401]: delete(a, z, axis = 1)
Out[401]:
 array([[  3.,  NaN],
       [  3.,   9.]])

In this example my goal is to delete all the columns that contain NaN's. I expect the last command to result in:

array([[2., 3.],
       [2., 3.]])

How can I do that?

+2  A: 

This creates another array without those columns:

  b = a.compress(logical_not(z), axis=1)
bpowah
cool. I wish matlab's syntax worked here: "a(:,z) = []" is much simpler
bgbg
similar: b = a[:,[1,2]]
bpowah
@bpowah: indeed. the more general way would be b = a[:,z]. You might want to update your answer accordingly
bgbg
+1  A: 

Another way is to use masked arrays:

import numpy as np
a = np.array([[ np.nan,   2.,   3., np.nan], [  1.,   2.,   3., 9]])
print(a)
# [[ NaN   2.   3.  NaN]
#  [  1.   2.   3.   9.]]

The np.ma.masked_invalid method returns a masked array with nans and infs masked out:

print(np.ma.masked_invalid(a))
[[-- 2.0 3.0 --]
 [1.0 2.0 3.0 9.0]]

The np.ma.compress_cols method returns a 2-D array with any column containing a masked value suppressed:

a=np.ma.compress_cols(np.ma.masked_invalid(a))
print(a)
# [[ 2.  3.]
#  [ 2.  3.]]

See manipulating-a-maskedarray

unutbu