views:

908

answers:

8

Folks,

is there a collection of gotchas where Numpy differs from python, points that have puzzled and cost time ?

"The horror of that moment I shall never never forget !"
"You will, though," the Queen said, "if you don't make a memorandum of it."

For example, NaNs are always trouble, anywhere. If you can explain this without running it, give yourself a point --

from numpy import array, NaN, isnan

pynan = float("nan")
print pynan is pynan, pynan is NaN, NaN is NaN
a = (0, pynan)
print a, a[1] is pynan, any([aa is pynan for aa in a])

a = array(( 0, NaN ))
print a, a[1] is NaN, isnan( a[1] )

(I'm not knocking numpy, lots of good work there, just think a FAQ or Wiki of gotchas would be useful.)

Edit: I was hoping to collect half a dozen gotchas (surprises for people learning Numpy).
Then, if there are common gotchas or, better, common explanations, we could talk about adding them to a community Wiki (where ?) It doesn't look like we have enough so far.

+1  A: 
print pynan is pynan, pynan is NaN, NaN is NaN

This tests identity, that is if it is the same object. The result should therefore obviously be True, False, True, because when you do float(whatever) you are creating a new float object.

a = (0, pynan)
print a, a[1] is pynan, any([aa is pynan for aa in a])

I don't know what it is that you find surprising with this.

a = array(( 0, NaN ))
print a, a[1] is NaN, isnan( a[1] )

This I did have to run. :-) When you stick NaN into an array it's converted into a numpy.float64 object, which is why a[1] is NaN fails.

This all seems fairly unsurprising to me. But then I don't really know anything much about NumPy. :-)

Lennart Regebro
+6  A: 

NaN is not a singleton like None, so you can't really use the is check on it. What makes it a bit tricky is that NaN == NaN is False as IEEE-754 requires. That's why you need to use the numpy.isnan() function to check if a float is not a number. Or the standard library math.isnan() if you're using Python 2.6+.

Ants Aasma
Well it's in the definition of NaN. def isnan(x): return (x != x)
kaizer.se
+3  A: 

I think this one is funny:

>>> import numpy as n
>>> a = n.array([[1,2],[3,4]])
>>> a[1], a[0] = a[0], a[1]
>>> a
array([[1, 2],
       [1, 2]])

For Python lists this does of course work as intended:

>>> b = [[1,2],[3,4]]
>>> b[1], b[0] = b[0], b[1]
>>> b
[[3, 4], [1, 2]]

Funny think is that numpy itself had a bug in the shuffle function, because it used that notation :-) See here for more on this.

The reason is of course that in the first case we are dealing with views of the array, so the values are overwritten in-place.

nikow
Could you explain how the numpy array ends up being so?
sundar
+5  A: 

The biggest gotcha for me was that almost every standard operator is overloaded to distribute across the array.

Define a list and an array

>>> l = range(10)
>>> l
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> import numpy
>>> a = numpy.array(l)
>>> a
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

Multiplication duplicates the python list, but distributes over the numpy array

>>> l * 2
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> a * 2
array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18])

Addition and division are not defined on python lists

>>> l + 2
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: can only concatenate list (not "int") to list
>>> a + 2
array([ 2,  3,  4,  5,  6,  7,  8,  9, 10, 11])
>>> l / 2.0
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for /: 'list' and 'float'
>>> a / 2.0
array([ 0. ,  0.5,  1. ,  1.5,  2. ,  2.5,  3. ,  3.5,  4. ,  4.5])

Numpy overloads to treat lists like arrays sometimes

>>> a + a
array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18])
>>> a + l
array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18])
Christian Oudard
Yes, that got me too. A simple table with columns: op, python, numpywould settle that.
Denis
A: 

from Neil Martinsen-Burrell in numpy-discussion 7 Sept --

The ndarray type available in Numpy is not conceptually an extension of Python's iterables. If you'd like to help other Numpy users with this issue, you can edit the documentation in the online documentation editor at numpy-docs

Denis
+1  A: 

The truth value of a Numpy array differs from that of a python sequence type, where any non-empty sequence is true.

>>> import numpy as np
>>> l = [0,1,2,3]
>>> a = np.arange(4)
>>> if l: print "Im true"
... 
Im true
>>> if a: print "Im true"
... 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: The truth value of an array with more than one element is ambiguous. Use
a.any() or a.all()
>>>

The numerical types are true when they are non-zero and as a collection of numbers, the Nupy array inherits this definition. But with a collection of numbers, truth could reasonably mean "all elements are non-zero" or "at least one element is non-zero". Numpy refuses to guess which definition is meant and raises the above exception. Using the .any() and .all() methods allows one to specify which meaning of true is meant.

>>> if a.any(): print "Im true"
... 
Im true
>>> if a.all(): print "Im true"
... 
>>>
nmb
+1  A: 

Slicing creates views, not copies.

>>> l = [1, 2, 3, 4]
>>> s = l[2:3]
>>> s[0] = 5
>>> l
[1, 2, 3, 4]

>>> a = array([1, 2, 3, 4])
>>> s = a[2:3]
>>> s[0] = 5
>>> a
array([1, 2, 5, 4])
Roberto Bonvallet
not always:"There are two kinds of fancy indexing in numpy, which behavesimilarly ..."http://mail.scipy.org/pipermail/numpy-discussion/2008-January/031101.html
Denis
A: 

I found the fact that multiplying up lists of elements just creates view of elements caught me out.

>>> a=[0]*5
>>>a
[0,0,0,0,0]
>>>a[2] = 1
>>>a
[0,0,1,0,0]
>>>b = [np.ones(3)]*5
>>>b
[array([ 1.,  1.,  1.]), array([ 1.,  1.,  1.]), array([ 1.,  1.,  1.]), array([ 1.,  1.,  1.]), array([ 1.,  1.,  1.])]
>>>b[2][1] = 2
>>>b
[array([ 1.,  2.,  1.]), array([ 1.,  2.,  1.]), array([ 1.,  2.,  1.]), array([ 1.,  2.,  1.]), array([ 1.,  2.,  1.])]

So if you create a list of elements like this and intend to do different operations on them you are scuppered ...

A straightforward solution is to iteratively create each of the arrays (using a 'for loop' or list comprehension) or use a higher dimensional array (where e.g. each of these 1D arrays is a row in your 2D array, which is generally faster).

sillyMunky