views:

200

answers:

4

Is there an easy (meaning without rolling one's own sorting function) way to sort parallel lists without unnecessary copying in Python? For example:

foo = range(5)
bar = range(5, 0, -1)
parallelSort(bar, foo)
print foo # [4,3,2,1,0]
print bar # [1,2,3,4,5]

I've seen the examples using zip but it seems silly to copy all your data from parallel lists to a list of tuples and back again if this can be easily avoided.

A: 

To achieve this, you would have to implement your own sort.

However: Does the unnecessary copying really hurt your application? Often parts of Python strike me as inefficient, too, but they are efficient enough for what I need.

bayer
I see your point about avoiding premature optimization, but sometimes (this being one such case) I like to write generic code and know that if I ever use it on a huge dataset or something it will "just work". In this case I'm more worried about running out of memory than about speed.
dsimcha
and wouldn't *your own sort* involve use of `zip`, `dict`, etc?
SilentGhost
No. Say you implement your own quicksort - you can make sure to make any swaps on both lists.
bayer
I would suspect any parallel sort implemented in Python for this would use a lot more time and a little more memory. The big-O speed and space cannot be improved, and you're adding a lot of overhead to do this in pure Python.
Mike Graham
@dimcha, If you have a large dataset and are running out of memory, the solution may be this, but it may be to use a numpy array, or an `array.array`, or a database, etc. Consider whether this is the solution that will really help.
Mike Graham
@Mike - Why would it use more memory?
bayer
@bayer, Because you would have to keep track of what you are doing in Python with several new Python objects, but the sorting implementation used in CPython has been optimized in C using C variables when possible. Like I said, I "suspect" this; I don't have a rigorous proof, let alone a shred of evidence.
Mike Graham
That additional memory would certainly be O(1) and not O(n). I am totally with you, using the Python sort stuff is smarter. However, if he has very hard memory requirements, this might be a different story...
bayer
+2  A: 

Is there an easy way? Yes. Use zip.

Is there an "easy way that doesn't use a zip variant"? No.

If you wanted to elaborate on why you object to using zip, that would be helpful. Either you're copying objects, in which case Python will copy by reference, or you're copying something so lightweight into a lightweight tuple as to not be worthy of optimization.

If you really don't care about execution speed but are especially concerned for some reason about memory pressure, you could roll your own bubble sort (or your sort algorithm of choice) on your key list which swaps both the key list and the target lists elements when it does a swap. I would call this the opposite of easy, but it would certainly limit your working set.

Jeffrey Harris
Just because you can't think of an easy way that doesn't use zip doesn't mean there isn't one - see my answer. :)
Nick Johnson
Your answer is zipping by another name, so I stand behind "there is no easy way that doesn't use a zip variant". This was a silly question to begin with, though, so if sorting in memory what are essentially tuples of (sort_value, index) is preferable to sorting tuples of (sort_value, target_value), fine.
Jeffrey Harris
"Zipping by another name"? It's certainly not - it doesn't have anything to do with zipping, and doesn't modify the original elements at all. Indeed, it doesn't even touch the second array.
Nick Johnson
Zipping just means constructing an iterable of tuples from two or more source iterables. What do you think sorted(indices, key=foo) is doing? It's constructing a list of tuples of (foo_val, index), then sorting them. That is indistinguishable from zipping (foo_val, target_val) and sorting.
Jeffrey Harris
A: 

Any solution I can imagine short of introducing a sort from scratch uses indices, or a dict, or something else that really is not apt to save you memory. In any event, using zip will only increase memory usage by a constant factor, so it is worth making sure this is really a problem before a solution.

If it does get to be a problem, there may be more effective solutions. Since the elements of foo and bar are so closely related, are you sure their right representation is not a list of tuples? Are you sure they should not be in a more compact data structure if you are running out of memory, such as a numpy array or a database (the latter of which is really good at this kind of manipulation)?

(Also, incidentally, itertools.izip can save you a little bit of memory over zip, though you still end up with the full zipped list in list form as the result of sorted.)

Mike Graham
+6  A: 

Here's an easy way:

perm = sorted(xrange(len(foo)), key=lambda x:foo[x])

This generates a list of permutations - the value in perm[i] is the index of the ith smallest value in foo. Then, you can access both lists in order:

for p in perm:
  print "%s: %s" % (foo[p], bar[p])

You'd need to benchmark it to find out if it's any more efficient, though - I doubt it makes much of a difference.

Nick Johnson
Change `range` to `xrange` if you want to make a difference. Unless you're using Python 3.
Chris Lutz
Hm, true. Or use .sort instead of sorted, but that ruins the one-liner-ness. ;)
Nick Johnson