views:

665

answers:

2

Python 2.x has two ways to overload comparison operators, __cmp__ or the "rich comparison operators" such as __lt__. The rich comparison overloads are said to be preferred, but why is this so?

Rich comparison operators are simpler to implement each, but you must implement several of them with nearly identical logic. However, if you can use the builtin cmp and tuple ordering, then __cmp__ gets quite simple and fulfills all the comparisons:

class A(object):
  def __init__(self, name, age, other):
    self.name = name
    self.age = age
    self.other = other
  def __cmp__(self, other):
    assert isinstance(other, A) # assumption for this example
    return cmp((self.name, self.age, self.other),
               (other.name, other.age, other.other))

This simplicity seems to meet my needs much better than overloading all 6(!) of the rich comparisons. (However, you can get it down to "just" 4 if you rely on the "swapped argument"/reflected behavior, but that results in a net increase of complication, in my humble opinion.)

Are there any unforeseen pitfalls I need to be made aware of if I only overload __cmp__?

I understand the <, <=, ==, etc. operators can be overloaded for other purposes, and can return any object they like. I am not asking about the merits of that approach, but only about differences when using these operators for comparisons in the same sense that they mean for numbers.

Update: As Christopher pointed out, cmp is disappearing in 3.x. Are there any alternatives that make implementing comparisons as easy as the above __cmp__?

+5  A: 

This is covered in the PEP: http://www.python.org/dev/peps/pep-0207/

Also, cmp goes away in python 3.0. ( Note that it is not present on http://docs.python.org/3.0/reference/datamodel.html but it IS on http://docs.python.org/reference/datamodel.html )

Christopher
The PEP is only concerned with why rich comparisions are needed, in the way NumPy users want A<B to return a sequence.
Roger Pate
I had not realized it's definitely going away, this makes me sad. (But thanks for pointing that out.)
Roger Pate
The PEP also discusses "why" they are preferred. Essentially it boils down to efficiency: 1. No need to implement operations that make no sense for your object (like unordered collections.) 2. Some collections have very efficient operations on some kinds of comparisons. Rich comparisons let the interpreter take advantage of that if you define them.
Christopher
Re 1, If they don't make sense, then don't implement __cmp__. Re 2, having both options can let you optimize as needed, while still quickly prototyping and testing. Neither one tells me why it's removed. (Essentially it boils down to developer efficiency for me.) Is it possible the rich comparisons are less efficient with the __cmp__ fallback in place? That wouldn't make sense to me.
Roger Pate
With __cmp__ you implicitly define an order. There is no "not equal" result. It is either less than, greater, or equal. Consider comparing dictionaries. You want to know if they are "equal", which could be defined as having the same keys with the same values. One dictionary doesn't really have a natural "order" when compared to another. So implementing __cmp__ for them yields a useless, perhaps dangerous ordering side effect when all you want to know is if they are equal or not.
Christopher
@R. Pate, as I try to explain in my answer, there's no real loss in generality (since a mixin, decorator, or metaclass, lets you easily define everything in terms of just < if you wish) and so for all Python implementations to carry around redundant code falling back to __cmp__ forevermore -- just to let Python users express things in two equivalent ways -- would run 100% against the grain of Python.
Alex Martelli
@Alex, yes, that ease of just defining one function in the common/simple case, __lt__ with your code, is what I was looking for once I realized I'd have to stop using __cmp__ eventually.
Roger Pate
@Christopher, I understand that, perhaps I wasn't clear, as I didn't mean to say that defining __cmp__ would be acceptable for those types.
Roger Pate
+12  A: 

Yep, it's easy to implement everything in terms of e.g. __lt__ with a mixin class (or a metaclass, or a class decorator if your taste runs that way).

For example:

class ComparableMixin:
  def __eq__(self, other):
    return not self<other and not other<self
  def __ne__(self, other):
    return self<other or other<self
  def __gt__(self, other):
    return other<self
  def __ge__(self, other):
    return not self<other
  def __le__(self, other):
    return not other<self

Now your class can define just __lt__ and multiply inherit from ComparableMixin (after whatever other bases it needs, if any). A class decorator would be quite similar, just inserting similar functions as attributes of the new class it's decorating (the result might be microscopically faster at runtime, at equally minute cost in terms of memory).

Of course, if your class has some particularly fast way to implement (e.g.) __eq__ and __ne__, it should define them directly so the mixin's versions are not use (for example, that is the case for dict) -- in fact __ne__ might well be defined to facilitate that as:

def __ne__(self, other):
  return not self == other

but in the code above I wanted to keep the pleasing symmetry of only using <;-). As to why __cmp__ had to go, since we did have __lt__ and friends, why keep another, different way to do exactly the same thing around? It's just so much dead-weight in every Python runtime (Classic, Jython, IronPython, PyPy, ...). The code that definitely won't have bugs is the code that isn't there -- whence Python's principle that there ought to be ideally one obvious way to perform a task (C has the same principle in the "Spirit of C" section of the ISO standard, btw).

This doesn't mean we go out of our way to prohibit things (e.g., near-equivalence between mixins and class decorators for some uses), but it definitely does mean that we don't like to carry around code in the compilers and/or runtimes that redundantly exists just to support multiple equivalent approaches to perform exactly the same task.

Further edit: there's actually an even better way to provide comparison AND hashing for many classes, including that in the question -- a __key__ method, as I mentioned on my comment to the question. Since I never got around to writing the PEP for it, you must currently implement it with a Mixin (&c) if you like it:

class KeyedMixin:
  def __lt__(self, other):
    return self.__key__() < other.__key__()
  # and so on for other comparators, as above, plus:
  def __hash__(self):
    return hash(self.__key__())

It's a very common case for an instance's comparisons with other instances to boil down to comparing a tuple for each with a few fields -- and then, hashing should be implemented on exactly the same basis. The __key__ special method addresses that need directly.

Alex Martelli
for example what?
Dustin Getz
@Dustin, sorry, accidental click -- hope the ample edit completing my answer remedies that.
Alex Martelli
I thought it was something similar and kept reloading every few minutes myself to see when it was posted, thanks. :)
Roger Pate
Sorry for the delay @R. Pate, I decided that since I was having to edit anyway I should provide the most thorough answer I could rather than rushing (and I've just edited again to suggest my old __key__ idea which I never got around to PEPping, as well as how to implement it with a mixin).
Alex Martelli
I really like that __key__ idea, going to use it and see how it feels. (Though named cmp_key or _cmp_key instead of a reserved name.)
Roger Pate