I see the answers have already done an excellent job, just wanted to point out one coding inefficiency in Human Sort. To apply a selective char-by-char translation to a unicode string s, it uses the code:
spec_dict = {'Å':'A', 'Ä':'A'}
def spec_order(s):
return ''.join([spec_dict.get(ch, ch) for ch in s])
Python has a much better, faster and more concise way to perform this auxiliary task (on Unicode strings -- the analogous method for byte strings has a different and somewhat less helpful specification!-):
spec_dict = dict((ord(k), spec_dict[k]) for k in spec_dict)
def spec_order(s):
return s.translate(spec_dict)
The dict you pass to the translate
method has Unicode ordinals (not strings) as keys, which is why we need that rebuilding step from the original char-to-char spec_dict
. (Values in the dict you pass to translate [as opposed to keys, which must be ordinals] can be Unicode ordinals, arbitrary Unicode strings, or None to remove the corresponding character as part of the translation, so it's easy to specify "ignore a certain character for sorting purposes", "map ä to ae for sorting purposes", and the like).
In Python 3, you can get the "rebuilding" step more simply, e.g.:
spec_dict = ''.maketrans(spec_dict)
See the docs for other ways you can use this maketrans
static method in Python 3.