views:

107

answers:

5

Hi, i need to sort the following list of Tuples in Python:

ListOfTuples = [('10', '2010 Jan 1;', 'Rapoport AM', 'Role of antiepileptic drugs as preventive agents for migraine', '20030417'), ('21', '2009 Nov;', 'Johannessen SI', 'Antiepilepticdrugs in epilepsy and other disorders--a population-based study of prescriptions', '19679449'),...]

My purpose is to order it by Descending year (listOfTuples[2]) and by Ascending Author (listOfTuples[2]):

sorted(result, key = lambda item: (item[1], item[2]))

But it doesn't work. How can i obtain sort stability?

+4  A: 
def descyear_ascauth(atup):
  datestr = atup[1]
  authstr = atup[2]
  year = int(datestr.split(None, 1)[0])
  return -year, authstr

... sorted(result, key=descyear_ascauth) ...

Notes: you need to extract the year as an integer (not as a string), so that you can change its sign -- the latter being the key trick in order to satisfy the "descending" part of the specifications. Squeezing it all within a lambda would be possible, but there's absolutely no reason to do so and sacrifice even more readability, when a def will work just as well (and far more readably).

Alex Martelli
Grazie mille, sei sempre gentilissimo! :) What approach should i use to add another order key such as "month"? Should i map month's names to a dict ('jan':1, 'feb:2')?
Gianluca Bargelli
@Gianluca, using an explicit dict gives you full control, and is therefore what I would recommend. You could play with `list(calendar.month_name)` to build the dict e.g. in a locale-dependent ways, but it's far more complication than warranted unless you have very specific needs in this direction.
Alex Martelli
Thanks for answering :) . Right now i can't decide which answer pick because also @Duncan posted a working approach on my problem. So far it's a matter of taste (Readability vs. Compactness) and performance (Using "tricks" vs "Doing the Python way")...
Gianluca Bargelli
Both doing two sorts (per @Duncan's idea) and doing a single one with a composite key (my answer) are really perfectly Pythonic ways (no tricks involved); however, doing a single sort will save about half the running time. (the old-fashioned, near-deprecated `cmp`, as in @THC4k's answer, can be much slower still). Readability and compactness are about the choice of `lambda` (which Duncan mis-spelled) versus `def` (as in my answer) which does not affect speed (as I mentioned you _can_ squeeze my approach into a `lambda`, it's just a very bad idea to do so).
Alex Martelli
The lambda was added as a late edit (hence the typo) when I realised I couldn't just use itemgetter because the year was part of a longer date. Your answer is probably almost always faster, but if, for example, instead of a year you wanted to reverse sort string in a locale aware manner it could be messy working out how to do that. Sorting on multiple keys is slower but has the advantage of being clear and straightforward. I think Gianluca should keep both options in his toolbox.
Duncan
A: 

Here is a idiom that works for everything, even thing you can't negate, for example strings:

data = [ ('a', 'a'), ('a', 'b'), ('b','a') ]

def sort_func( a, b ):
    # compare tuples with the 2nd entry switched
    # this inverts the sorting on the 2nd entry
    return cmp( (a[0], b[1]), (b[0], a[1]) ) 

print sorted( data )                    # [('a', 'a'), ('a', 'b'), ('b', 'a')]
print sorted( data, cmp=sort_func )     # [('a', 'b'), ('a', 'a'), ('b', 'a')]
THC4k
`cmp` no longer works in Python 3, although there is `cmp_to_key` in functools.
KennyTM
+2  A: 

The easiest way is to sort on each key value separately. Start at the least significant key and work your way up to the most significant.

So in this case:

import operator
ListOfTuples.sort(key=operator.itemgetter(2))
ListOfTuples.sort(key=lambda x: x[1][:4], reverse=True)

This works because Python's sorting is always stable even when you use the reverse flag: i.e. reverse doesn't just sort and then reverse (which would lose stability, it preserves stability after reversing.

Of course if you have a lot of key columns this can be inefficient as it does a full sort several times.

You don't have to convert the year to a number this way as its a genuine reverse sort, though you could if you wanted.

Duncan
Your solution is compact and pythonic but @Alex's is faster. Can't decide who's the winner :)
Gianluca Bargelli
A: 

Here's a rough solution that takes month abbreviature and day (if found) in account:

import time
import operator

def sortkey(seq):
    strdate, author = seq[1], seq[2]
    spdate = strdate[:-1].split()
    month = time.strptime(spdate[1], "%b").tm_mon
    date = [int(spdate[0]), month] + map(int, spdate[2:])
    return map(operator.neg, date), author  

print sorted(result, key=sortkey)

"%b" is locale's abbreviated month name, you can use a dictionary if you prefer not to deal with locales.

tokland
A: 

Here is the lambda version of Alex's answer. I think it looks more compact than Duncan's answer now, but obviously a lot of the readability of Alex's answer has been lost.

sorted(ListOfTuples, key=lambda atup: (-int(atup[1].split(None, 1)[0]), atup[2]))

Readability and efficiency should usually be preferred to compactness.

gnibbler