views:

101

answers:

4

If I understand correctly, the __cmp__() function of an object is called in order to evaluate all objects in a collection while determining whether an object is a member, or 'in', the collection. However, this does not seem to be the case for sets:

class MyObject(object):
    def __init__(self, data):
        self.data = data

    def __cmp__(self, other):
        return self.data-other.data

a = MyObject(5)
b = MyObject(5)

print a in [b]          //evaluates to True, as I'd expect
print a in set([b])     //evaluates to False

How is an object membership tested in a set, then?

A: 

A set uses a dict behind the scenes, so the "in" statement is checking whether the object exists as a key in the dict. Since your object doesn't implement a hash function, the default hash function for objects uses the object's id. So even though a and b are equivalent, they're not the same object, and that's what's being tested.

Colin
+1  A: 
>>> xs = []
>>> set([xs])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'

There you are. Sets use hashes, very similar to dicts. This help performance extremely (membership tests are O(1), and many other operations depend on membership tests), and it also fits the semantics of sets well: Set items must be unique, and different items will produce different hashes, while same hashes indicate (well, in theory) duplicates.

Since the default __hash__ is just id (which is rather stupid imho), two instances of a class that inherits object's __hash__ will never hash to the same value (well, unless adress space is larger than the sizeof the hash).

delnan
+2  A: 

Adding a __hash__ method to your class yields this:

class MyObject(object):
    def __init__(self, data):
        self.data = data

    def __cmp__(self, other):
        return self.data - other.data

    def __hash__(self):
        return hash(self.data)


a = MyObject(5)
b = MyObject(5)

print a in [b] # True
print a in set([b]) # Also True!
Nathon
That's exactly what I was looking for. Thanks :)
jifa
Though that should propably read `return hash(self.data)`.
delnan
// Also - for spotting my non-python-native commenting habits.
jifa
@delnan, good point. Changed.
Nathon
A: 

As others pointed, your objects don't have a _hash_ so they use the default id as a hash, and you can override it as Nathon suggested, BUT read the docs about _hash_, specifically the points about when you should and should not do that.

Ofri Raviv
read them after the previous answer - good stuff :)
jifa

related questions