ansaurus

Question

Finding a list of indices from master array using secondary array with non-unique entries

Answer 1

A:

I'm not sure if there is a way to do this automatically in python, but you're probably best off sorting the two arrays and then generating your output in one pass through b. The complexity of that operation should be O(|a|*log|a|)+O(|b|*log|b|)+O(|b|) = O(|b|*log|b|) (assuming |b| > |a|). I believe your original try has complexity O(|a|*|b|), so this should provide a noticeable improvement for a sufficiently large b.

VeeArr 2010-06-11 16:16:07

Answer 2

+1 A:

The current way you are doing it with where searching through the whole array of a each time. You can make this look-up O(1) instead of O(N) using a dict. For instance, I used the following method:

def method2(a,b):
    tmpdict = dict(zip(a,range(len(a))))
    idx = numpy.array([tmpdict[bi] for bi in b])

and got a very large speed-up which will be even better for larger arrays. For the sizes that you had in your example code, I got a speed-up of 15x. The only problem with my code is that if there are repeated elements in a, then the dict will currently point to the last instance of the element while with your method it will point to the first instance. However, that can remedied if there are to be repeated elements in the actual usage of the code.

Justin Peel 2010-06-11 16:42:21

Very nice. Thank you. This works well since there are no repeated elements in `a` as it is a list of unique id numbers.

fideli 2010-06-12 21:23:17

@fideli, That's what I'd guessed, but your example with random numbers didn't rule out repeats. Glad that I could help.

Justin Peel 2010-06-12 21:33:55

ansaurus

tags:

views:

answers:

Finding a list of indices from master array using secondary array with non-unique entries

related questions