ansaurus

Question

Answer 1

+2 A:

test[numpy.logical_or.reduce([test[:,1] == x for x in wanted])]

The result should be faster than the original version since numpy's doing the inner loops instead of python.

Amnon 2009-12-26 10:12:28

This solution goes through the array len(wanted) times. It is usually faster to go through the array in a single pass.

EOL 2009-12-26 11:04:45

Thanks Amnon. This is the solution that I decided to accept. I think it is clear to understand and is about 20 x faster than my original solution.

Raja 2009-12-26 12:01:11

Answer 2

+4 A:

The following solution has the advantage of going through your array only once:

@numpy.vectorize
def selected(elmt): return elmt in wanted

print test[selected(test[:, 1])]

It is also fast because it uses Numpy's fast loops. You also get the optimization of the in operator: once an element matches, the remaining elements do not have to be tested (as opposed to the "logical or" approach, were all the elements in wanted are tested, possibly unnecessarily).

Alternatively, you could use the following one-liner, which also goes through your array only once:

test[numpy.apply_along_axis(lambda x: x[1] in wanted, 1, test)]

This is much much slower, though, as this extracts the element in the second column at each iteration (instead of doing is in one pass, as in the first solution).

EOL 2009-12-26 10:56:50

These solutions call python for every element instead of using numpy's comparison. According to my tests, your first solution is faster than mine for len(wanted)=50 but slower for len(wanted)=5.

Amnon 2009-12-26 11:10:07

EOL, many thanks for your time and effort. Your explanations were clear. I chose to use Amnon's solution because for my usual scenario (len(test) about 1000 and len(wanted) about 3-5), that was faster than your first solution. The speed difference is not huge, but I also found it clearer. But it was good to be reminded of numpy's vectorize and I am sure I will find a use for it soon.

Raja 2009-12-26 12:04:25

@Amnon: Good point, and interesting results. Thanks!

EOL 2009-12-26 22:55:15

Answer 3

A:

This is ten times faster than Amnon's variant for len(test)=1000:

wanted = (2,4,6)
wanted2 = numpy.expand_dims(wanted, 1)
print test[numpy.any(test[:, 1] == wanted2, 0), :]

Antony Hatchkins 2009-12-26 18:22:52

@ahatchkins: Some typos in your version. What you are suggesting is this - # convert wanted list into a one column array wanted = numpy.array(wanted).reshape((len(wanted),1)) # print test[numpy.any(test[:,1] == wanted, 0)]In my tests, this is about 2 times faster than Amnon's solution.

Raja 2009-12-27 05:20:17

Yes, there was a typo: s/wanted2/wanted/ . Fixed

Antony Hatchkins 2009-12-27 06:13:45

Hmm, yes. You are right. Two typos in fact. And only 2 times faster. )

Antony Hatchkins 2009-12-27 11:35:31

ansaurus

tags:

views:

answers:

Selecting rows from numpy ndarray

related questions