ansaurus

Question

Numpy - why value error for NaN when trying to delete rows

Answer 1

+5 A:

Just generating a new array is no option?

numpy.array([x for x in A if x[0] in li])

atomocopter 2010-10-06 14:16:36

Yes, much simpler than my solution!

eumiro 2010-10-06 14:19:13

I think the original poster wanted to retain the rows where `row[0]` was in `li`, need to eliminate `not` from the condition in your list comprehension.

dtlussier 2010-10-06 15:29:33

@dtlussier: thanks for pointing out my mistake. :)

atomocopter 2010-10-06 21:24:01

Answer 2

+2 A:

It appears you want to delete a row of your array in-place, however, this is not possible using the np.delete function, as such an operation goes against the way that Python and Numpy manage memory.

I found an interesting post on the Numpy mailing list (Travis Oliphant, [Numpy-discussion] Deleting a row from a matrix) where the np.delete function is first discussed:

So, "in-place" deletion of array objects would not be particularly useful, because it would only work for arrays with no additional reference counts (i.e. simple b=a assignment would increase the reference count and make it impossible to say del a[obj]).

....

But, the problem with both of those approaches is that once you start removing arbitrary rows (or n-1 dimensional sub-spaces) from an array you very likely will no longer have a chunk of memory that can be described using the n-dimensional array memory model.

If you take a look at the documentation for np.delete (http://docs.scipy.org/doc/numpy/reference/generated/numpy.delete.html), we can see that the function returns a new array with the desired parts (not necessarily rows) deleted.

Definition:       np.delete(arr, obj, axis=None)
Docstring:
Return a new array with sub-arrays along an axis deleted.

Parameters
----------
arr : array_like
  Input array.
obj : slice, int or array of ints
  Indicate which sub-arrays to remove.
axis : int, optional
  The axis along which to delete the subarray defined by `obj`.
  If `axis` is None, `obj` is applied to the flattened array.

Returns
-------
out : ndarray
    A copy of `arr` with the elements specified by `obj` removed. Note
    that `delete` does not occur in-place. If `axis` is None, `out` is
    a flattened array.

So, in your case I think you'll want to do something like:

A = array([['id1', '1', '2', 'NaN'],
           ['id2', '2', '0', 'NaN']])

li = ['id1', 'id3', 'id6']

for i, row in enumerate(A):
    if row[0] not in li:
        A = np.delete(A, i, axis=0)

A is now cut down as you wanted, but remember it is a new piece of memory. Each time np.delete is called new memory is allocated which the name A will point to.

I'm sure there is a better vectorized way (maybe using masked arrays?) to find out which rows to delete, but I couldn't get it together. If anyone has it though please comment!

dtlussier 2010-10-06 15:04:17

ansaurus

tags:

views:

answers:

Numpy - why value error for NaN when trying to delete rows

related questions