ansaurus

Question

Help sorting: first by this, and then by that ...

Answer 1

+10 A:

def my_cmp(x, y):
  x1, x2 = x[0].split('_')
  y1, y2 = y[0].split('_')
  return -cmp(x1, y1) or cmp(x2, y2)

my_list = [
    (u'community_news', u'Community: News & Information'), 
    (u'kf_video', u'KF: Video'), 
    (u'community_video', u'Community: Video'), 
    (u'kf_news', u'KF: News & Information'), 
    (u'kf_magazine', u'KF: Magazine')
]

sorted_list = [
    (u'kf_magazine', u'KF: Magazine'),
    (u'kf_news', u'KF: News & Information'), 
    (u'kf_video', u'KF: Video'), 
    (u'community_news', u'Community: News & Information'), 
    (u'community_video', u'Community: Video'), 
]

my_list.sort(cmp=my_cmp)
assert my_list == sorted_list

Ayman Hourieh 2009-11-01 14:21:44

I was just about to edit mine to negate the cmp call instead when you posted your answer. :)

Kylotan 2009-11-01 14:23:17

I've simplified the comparison further to `-cmp(x1, y1) or cmp(x2, y2)`. :)

Ayman Hourieh 2009-11-01 14:26:58

you could also pass the key argument to sort and get rid of the split at the top of your function: my_list.sort(cmp=my_cmp,key=lambda x: x[0].split('_'))

Steven Graham 2009-11-01 14:46:37

that is a beautiful thing, thank you!

thornomad 2009-11-01 15:10:01

Answer 2

+2 A:

>>> def my_cmp(tuple_1, tuple_2):
    xxx_1, yyy_1 = tuple_1[0].split('_')
    xxx_2, yyy_2 = tuple_2[0].split('_')
    if xxx_1 > xxx_2:
     return -1
    elif xxx_1 < xxx_2:
     return 1
    else:
     return cmp(yyy_1, yyy_2)


>>> import pprint
>>> pprint.pprint(sorted(mylist, my_cmp))
[(u'kf_magazine', u'KF: Magazine'),
 (u'kf_news', u'KF: News & Information'),
 (u'kf_video', u'KF: Video'),
 (u'community_news', u'Community: News & Information'),
 (u'community_video', u'Community: Video')]

Not the prettiest solution in the world...

Kylotan 2009-11-01 14:22:20

Answer 3

+8 A:

Custom comparison functions for sorting, as suggested in existing answers, do make it easy to sort in a mix of ascending and descending orders -- but they have serious performance issues and have been removed in Python 3, leaving only the preferred customization approach -- custom key-extraction functions... much speedier, though more delicate to use for the relatively rare use case of mixed ascending/descending sorts.

In Python 2.*, which supports either kind of customization (not both in the same call to sort or sorted:-), a custom comparison function can be passed as a cmp= named argument; or, a custom key-extraction function can be passed as a key= named argument. In Python 3.*, only the latter option is available.

It's definitely worth understanding the key-extraction approach, even if you think you've just solved your problem with a custom-comparison approach instead: not just for performance, but for future-proofness (Python 3) and for generality (the key= approach also applies to min, max, itertools.groupby... much more general than the cmp= approach!).

Key-extraction is very simple when all the key subfields are to be sorted the same way (all ascending, or all descending) -- you just extract them; it's still pretty easy if the subfields that go "the other way" are numbers (you just change their sign while extracting); the delicate case is exactly the one you have -- multiple string fields that must be compared in oppposite ways.

A reasonably simple approach to solving your problem is a tiny shim class:

class Reverser(object):
  def __init__(self, s): self.s = s
  def __lt__(self, other): return other.s < self.s
  def __eq__(self, other): return other.s == self.s

Note that you only have to supply __lt__ and __eq__ (the < and == operators) -- sort and friends synthesize all other comparisons, if needed, based on those two.

So, armed with this little auxiliary tool, we can proceed easily...:

def getkey(tup):
    a, b = tup[0].split('_')
    return Reverser(a), b

my_list.sort(key=getkey)

As you see, once you "get" the reverser and key extraction concepts, you pay essentially no price for using key extraction instead of custom comparison: the code I suggest is 4 statements for the reverser class (which you can write once and put into your "goodies bag" module somewhere), three for the key extraction function, and of course one for the sort or sorted call -- a total of eight vs the 4 + 1 == 5 of the custom comparison approach in the most compact form (i.e. the one using either cmp with a sign change, or cmp with swapped arguments). Three statements are not much of a price to pay for key-extraction's advantages!-)

Performance is clearly not a big issue with such a short list, but with an even modestly longer (10 times) one...:

# my_list as in the Q, my_cmp as per top A, getkey as here

def bycmp():
  return sorted(my_list*10, cmp=my_cmp)

def bykey():
  return sorted(my_list*10, key=getkey)

...

$ python -mtimeit -s'import so' 'so.bykey()'
1000 loops, best of 3: 548 usec per loop
$ python -mtimeit -s'import so' 'so.bycmp()'
1000 loops, best of 3: 995 usec per loop

I.e., the key= approach is already showing a performance gain of almost two times (sorting the list twice as fast) when working on a 50-items list -- well worth the modest price of "8 lines rather than 5", particularly with all the other advantages I already mentioned!

Alex Martelli 2009-11-01 16:01:42

Wow, I like your solution. I didn't know that the cmp= approach had such a penalty.

Steven Graham 2009-11-01 17:22:59

@Steven, tx -- yep, not everybody understands why cmp= was removed in Python 3 (as an "attractive nuisance" which tempts people into suffering a performance penalty!-), that's exactly why I posted this detailed explanation, so thanks for confirming it can help!-)

Alex Martelli 2009-11-01 17:41:34

@Alex: I hesitate to edit one of *your* answers but perhaps my_list.key(cmp=my_cmp) should be my_list.sort(key=getkey) ?

Ned Deily 2009-11-01 19:14:03

@Ned, absolutely, silly me, editing now ("quandoque bonus dormitat Homerus...!-)

Alex Martelli 2009-11-01 19:37:05

This is really nice. +1 for Reverser to be a library class

gnibbler 2009-11-01 19:57:30

@Alex, I think you are doing `bycmp()` a favour! By having so many duplicates the performance is much closer to O(n) that it would be in real life

gnibbler 2009-11-01 20:05:55

@Alex, how is the key extraction technique able to be faster? Does Python store previously extracted keys for an element (with an accompanying cost in memory)?

Kylotan 2009-11-01 21:35:20

@gnibbler, maybe: feel free to benchmark with random data!-). @Kylotan: yes, a single pass (O(N), all cases, in both time and space) extracts all keys, then sorting proper occurs (O(N) in the best case, O(N log N) typical and worst case). `cmp` needs O(N log N) (typical and worst case) extra time. The "cost in memory" is of course totally transient (handled internally in C, free of garbage collection issues, and comparable to and typically less than the input -- and the extra working memory, for `sorted`).

Alex Martelli 2009-11-01 22:29:49

@Alex: it sounds like it could still be a lot of memory in some cases, especially compared to a plain sort which theoretically requires no extra memory allocation. It seems like something you might need to be able to bypass - is this possible? I couldn't find anything elucidating at docs.python.org/3.1 , except that the docs still say "The sort() and reverse() methods modify the sequence in place for economy of space" which would seem to be at odds with taking copies of all the keys!

Kylotan 2009-11-02 10:48:38

@Kylotan, you're probably misled by thinking of "copies" where no copy needs to happen (_references_ are in play). No, it's not possible to "bypass" key extraction when customizing sorting in Python 3 -- exactly because your fears about "a lot of extra memory" are simply overblown. Working in place rather than making a new list while leaving the old one alone halves the total amount of (reference) memory in play _in a way that's not intrinsically transient and does have to deal with gc_. Why not do some measurements rather than play Chicken Little?-)

Alex Martelli 2009-11-02 16:48:53

ansaurus

tags:

views:

answers:

Help sorting: first by this, and then by that ...

related questions