views:

51

answers:

1

What's wrong with this snippet of code?

import numpy as np
from scipy import stats

d = np.arange(10.0)
cutoffs = [stats.scoreatpercentile(d, pct) for pct in range(0, 100, 20)]
f = lambda x: np.sum(x > cutoffs)
fv = np.vectorize(f)

# why don't these two lines output the same values?
[f(x) for x in d] # => [0, 1, 2, 2, 3, 3, 4, 4, 5, 5]
fv(d)             # => array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

Any ideas?

+1  A: 

cutoffs is a list. The numbers you extract from d are all turned into float and applied using numpy.vectorize. (It's actually rather odd—it looks like first it tries numpy floats that work like you want then it tries normal Python floats.) By a rather odd, stupid behavior in Python, floats are always less than lists, so instead of getting things like

>>> # Here is a vectorized array operation, like you get from numpy. It won't
>>> # happen if you just use a float and a list.
>>> 2.0 > [0.0, 1.8, 3.6, 5.4, 7.2]
[True, True, False, False, False] # not real

you get

>>> # This is an actual copy-paste from a Python interpreter
>>> 2.0 > [0.0, 1.8, 3.6, 5.4, 7.2]
False

To solve the problem, you can make cutoffs a numpy array instead of a list. (You could probably also move the comparison into numpy operations entirely instead of faking it with numpy.vectorize, but I do not know offhand.)

Mike Graham
"floats are always less than lists" is not obviously "stupid": how do you compare things that cannot be compared? (like apples and ideas)
EOL
You don't, which is why this should raise an exception. (In 3.x, comparing a list and a float will raise `TypeError`, in realisation that Python shouldn't do things that make no sense.)
Mike Graham