views:

1511

answers:

6

I need to sort two arrays simultaneously, or rather I need to sort one of the arrays and bring the corresponding element of its associated array with it as I sort. That is if the array is [(5, 33), (4, 44), (3, 55)] and I sort by the first axis (labeled below dtype='alpha') then I want: [(3.0, 55.0) (4.0, 44.0) (5.0, 33.0)]. These are really big data sets and I need to sort first ( for nlog(n) speed ) before I do some other operations. I don't know how to merge my two separate arrays though in the proper manner to get the sort algorithm working. I think my problem is rather simple. I tried three different methods:

import numpy
x=numpy.asarray([5,4,3])
y=numpy.asarray([33,44,55])

dtype=[('alpha',float), ('beta',float)]

values=numpy.array([(x),(y)])
values=numpy.rollaxis(values,1)
#values = numpy.array(values, dtype=dtype)
#a=numpy.array(values,dtype=dtype)
#q=numpy.sort(a,order='alpha')
print "Try 1:\n", values

values=numpy.empty((len(x),2))
for n in range (len(x)):
        values[n][0]=y[n]
        values[n][1]=x[n]
print "Try 2:\n", values
#values = numpy.array(values, dtype=dtype)
#a=numpy.array(values,dtype=dtype)
#q=numpy.sort(a,order='alpha')

###
values = [(x[0], y[0]), (x[1],y[1]) , (x[2],y[2])]
print "Try 3:\n", values
values = numpy.array(values, dtype=dtype)
a=numpy.array(values,dtype=dtype)
q=numpy.sort(a,order='alpha')

print "Result:\n",q

I commented out the first and second trys because they create errors, I knew the third one would work because that was mirroring what I saw when I was RTFM. Given the arrays x and y (which are very large, just examples shown) how do I construct the array (called values) that can be called by numpy.sort properly?

*** Zip works great, thanks. Bonus question: How can I later unzip the sorted data into two arrays again?

+2  A: 

I think you just need to specify the axis that you are sorting on when you have made your final ndarray. Alternatively argsort one of the original arrays and you'll have an index array that you can use to look up in both x and y, which might mean you don't need values at all.

(scipy.org seems to be unreachable right now or I would post you a link to some docs)

Given that your description doesn't quite match your code snippet it's hard to say with certainty, but I think you have over-complicated the creation of your numpy array.

Simon
+6  A: 

I think what you want is the zip function. If you have

x = [1,2,3]
y = [4,5,6]

then zip(x,y) == [(1,4),(2,5),(3,6)]

So your array could be constructed using

a = numpy.array(zip(x,y), dtype=dtype)
Il-Bhima
+1: zip works well with generators, so you don't have to create giant in-memory lists, you can use generator functions instead.
S.Lott
How can I later unzip the sorted data into two arrays again?
Alex
once you have `a` above, you can do `c, d = zip(*a)` to unzip.
Autoplectic
+1  A: 

I couldn't get a working solution using Numpy's sort function, but here's something else that works:

import numpy
x = [5,4,3]
y = [33,44,55]
r = numpy.asarray([(x[i],y[i]) for i in numpy.lexsort([x])])

lexsort returns the permutation of the array indices which puts the rows in sorted order. If you wanted your results sorted on multiple keys, e.g. by x and then by y, use numpy.lexsort([x,y]) instead.

David Zaslavsky
+3  A: 

for your bonus question -- zip actually unzips too:

In [1]: a = range(10)
In [2]: b = range(10, 20)
In [3]: c = zip(a, b)
In [4]: c
Out[4]: 
[(0, 10),
 (1, 11),
 (2, 12),
 (3, 13),
 (4, 14),
 (5, 15),
 (6, 16),
 (7, 17),
 (8, 18),
 (9, 19)]
In [5]: d, e = zip(*c)
In [6]: d, e
Out[6]: ((0, 1, 2, 3, 4, 5, 6, 7, 8, 9), (10, 11, 12, 13, 14, 15, 16, 17, 18, 19))
Autoplectic
+2  A: 

Simon suggested argsort as an alternative approach; I'd recommend it as the way to go. No messy merging, zipping, or unzipping: just access by index.

idx = numpy.argsort(x)
ans = [ (x[idx[i]],y[idx[i]]) for i in idx]
Vicki Laidler
A: 

zip() might be inefficient for large arrays. numpy.dstack() could be used instead of zip:

ndx = numpy.argsort(x)
values = numpy.dstack(x[ndx], y[ndx])
J.F. Sebastian