views:

295

answers:

4

I had a strange bug when porting a feature to the Python 3.1 fork of my program. I narrowed it down to the following hypothesis:

In contrast to Python 2.x, in Python 3.x if an object has a .__eq__ method it is automatically unhashable.

Is this true?

Here's what happens in Python 3.1:

>>> class O(object):
    def __eq__(self, other):
        return 'whatever'


>>> o = O()
>>> d = {o: 0}
Traceback (most recent call last):
  File "<pyshell#16>", line 1, in <module>
    d = {o: 0}
TypeError: unhashable type: 'O'

The follow-up question is, how do I solve my personal problem? I have an object ChangeTracker which stores a WeakKeyDictionary that points to several objects, giving for each the value of their pickle dump at a certain time point in the past. Whenever an existing object is checked in, the change tracker says whether its new pickle is identical to its old one, therefore saying whether the object has changed in the meantime. Problem is, now I can't even check if the given object is in the library, because it makes it raise an exception about the object being unhashable. (Cause it has a __eq__ method.) How can I work around this?

A: 

I'm no python expert, but wouldn't it make sense that, when you define a eq-method, you also have to define a hash-method as well (which calculates the hash value for an object) Otherwise, the hashing mechanism wouldn't know if it hit the same object, or a different object with just the same hash-value. Actually, it's the other way around, it'd probably end up computing different hash values for objects considered equal by your __eq__ method.

I have no idea what that hash function is called though, __hash__ perhaps? :)

roe
FYI: I drew this sometime ago (http://www.mindmeister.com/10510492/python-underscore): a mindmap of all the underscore methods in python.
jldupont
It's actually vice versa. If two objects have the same hash value, but are not equal - that's fine. The "hashing mechanism" (i.e. the dictionary) first checks the hash, and, on same hash, also compares for equality. The real problem occurs the other way 'round: objects hash differently but still compare the same. The dictionary should find them, but doesn't (or you may get duplicate keys in the dictionary).
Martin v. Löwis
@Martin v. Löwis: I realized that, and added that at the end, or is there a subtle difference to what I said?
roe
+10  A: 

Yes, if you define __eq__, the default __hash__ (namely, hashing the address of the object in memory) goes away. This is important because hashing needs to be consistent with equality: equal objects need to hash the same.

The solution is simple: just define __hash__ along with defining __eq__.

Martin v. Löwis
Let me add for posterity: I defined `__hash__` as `return id(self)`.
cool-RR
@cool-RR: Use of `id(self)` in `__hash__()` in certainly wrong unless you defined `__eq__()` to compare by object identity (which seems meaningless). What about updating your question with real `__eq__()` implementation so that we could suggest `__hash__()` complementing it?
Denis Otkidach
+2  A: 

Check the Python 3 manual on object.__hash__:

If a class does not define an __eq__() method it should not define a __hash__() operation either; if it defines __eq__() but not __hash__(), its instances will not be usable as items in hashable collections.

Emphasis is mine.

If you want to be lazy, it sounds like you can just define __hash__(self) to return id(self):

User-defined classes have __eq__() and __hash__() methods by default; with them, all objects compare unequal (except with themselves) and x.__hash__() returns id(x).

Mark Rushakoff
Surely the only reason to overload the equals operator is because two different objects can sometimes compare equal (if not then don't bother overloading it). In this case the `return id(self)` hash function is broken (equal objects must hash the same, see Martin's answer). The "I don't care" hash function would be `return 1`. This is simple and satisfies all the conditions (it's just very inefficient!)
Scott Griffiths
+1  A: 

This paragraph from http://docs.python.org/3.1/reference/datamodel.html#object.%5F%5Fhash%5F%5F

If a class that overrides __eq__() needs to retain the implementation of __hash__() from a parent class, the interpreter must be told this explicitly by setting __hash__ = <ParentClass>.__hash__. Otherwise the inheritance of __hash__() will be blocked, just as if __hash__ had been explicitly set to None.

newacct