views:

426

answers:

4

If I have two parallel lists and want to sort them by the order of the elements in the first, it's very easy:

>>> a = [2, 3, 1]
>>> b = [4, 6, 2]
>>> a, b = zip(*sorted(zip(a,b)))
>>> print a
(1, 2, 3)
>>> print b
(2, 4, 6)

How can I do the same using numpy arrays without unpacking them into conventional Python lists?

+5  A: 

b[a.argsort()] should do the trick.

Here's how it works. First you need to find a permutation that sorts a. argsort is a method that computes this:

>>> a = numpy.array([2, 3, 1])
>>> p = a.argsort()
>>> p
[2, 0, 1]

You can easily check that this is right:

>>> a[p]
array([1, 2, 3])

Now apply the same permutation to b.

>>> b = numpy.array([4, 6, 2])
>>> b[p]
array([2, 4, 6])
Jason Orendorff
This doesn't use `b` for "auxiliary sorting", for example when `a` has elements that repeat. Please see my answer for details.
Alok
+4  A: 

Here's an approach that creates no intermediate Python lists, though it does require a NumPy "record array" to use for the sorting. If your two input arrays are actually related (like columns in a spreadsheet) then this might open up an advantageous way of dealing with your data in general, rather than keeping two distinct arrays around all the time, in which case you'd already have a record array and your original problem would be answered merely by calling sort() on your array.

This does an in-place sort after packing both arrays into a record array:

>>> from numpy import array, rec
>>> a = array([2, 3, 1])
>>> b = array([4, 6, 2])
>>> c = rec.fromarrays([a, b])
>>> c.sort()
>>> c.f1   # fromarrays adds field names beginning with f0 automatically
array([2, 4, 6])

Edited to use rec.fromarrays() for simplicity, skip redundant dtype, use default sort key, use default field names instead of specifying (based on this example).

Peter Hansen
Thanks! I really wish I could accept two answers. This one is less simple but more general. I've upvoted it though, as the least I could do :-)
YGA
+1  A: 

This might the simplest and most general way to do what you want. (I used three arrays here, but this will work on arrays of any shape, whether two columns or two hundred).

import numpy as NP
fnx = lambda : NP.random.randint(0, 10, 6)
a, b, c = fnx(), fnx(), fnx()
abc = NP.column_stack((a, b, c))
keys = (abc[:,0], abc[:,1])          # sort on 2nd column, resolve ties using 1st col
indices = NP.lexsort(keys)        # create index array
ab_sorted = NP.take(abc, indices, axis=0)

One quirk w/ lexsort is that you have to specify the keys in reverse order, i.e., put your primary key second and your secondary key first. In my example, i want to sort using the 2nd column as the primary key so i list it second; the 1st column resolves ties only, but it is listed first).

doug
nice catch Brendan, thanks.
doug
+1  A: 

Jason's solution works if a has no repeating values. If a has values that repeat, the method is not equivalent to the zip(*sorted(zip(a,b))) method (it only sorts on a, whereas the zip method sorts on both a and b):

>>> a = numpy.array([1, 2, 3, 1, 1, 1])
>>> b = numpy.array([9, 4, 6, 2, 5, 7])
>>> zip(*sorted(zip(a,b)))
[(1, 1, 1, 1, 2, 3), (2, 5, 7, 9, 4, 6)]
>>> b[a.argsort()]
array([9, 2, 5, 7, 4, 6]) # could be different for different python/numpy versions.

If that is okay with you, then that seems to be the easiest method. Otherwise, use Peter's solution:

>>> c = numpy.rec.fromarrays([a, b], names='a,b')
>>> c.sort()
>>> c.a
array([1, 1, 1, 1, 2, 3])
>>> c.b
array([2, 5, 7, 9, 4, 6])
Alok