views:

183

answers:

2

Hi all,

I have two arrays, a1 and a2. Assume len(a2) >> len(a1), and that a1 is a subset of a2.

I would like a quick way to return the a2 indices of all elements in a1. The time-intensive way to do this is obviously:

from operator import indexOf
indices = []
for i in a1:
    indices.append(indexOf(a2,i))

This of course takes a long time where a2 is large. I could also use numpy.where() instead (although each entry in a1 will appear just once in a2), but I'm not convinced it will be quicker. I could also traverse the large array just once:

for i in xrange(len(a2)):
    if a2[i] in a1:
        indices.append(i)

But I'm sure there is a faster, more 'numpy' way - I've looked through the numpy method list, but cannot find anything appropriate.

Many thanks in advance,

D

A: 

how about:

wanted = set(a1)
indices =[idx for (idx, value) in enumerate(a2) if value in wanted]

This should be O(len(a1)+len(a2)) instead of O(len(a1)*len(a2))

NB I don't know numpy so there may be a more 'numpythonic' way to do it, but this is how I would do it in pure python.

Dave Kirby
should that be enumerate(a2)?
Dave
Oops, my bad. Fixed it now.
Dave Kirby
A: 

How about

numpy.nonzero(numpy.setmember1d(a2, a1))[0]

This should be fast. From my basic testing, it's about 7 times faster than your second code snippet for len(a2) == 100, len(a1) == 10000, and only one common element at index 45. This assumes that both a1 and a2 have no repeating elements.

Alok
I compared your solution to Dave Kirby's above, with this one being approx 1.35X faster for len(a2) == 12347424, len(a1) == 1338, so this solution get's my vote - thanks!
Dave