I have a class called GraphEdge
which I would like to be uniquely defined within a set (the built-in set
type) by its tail
and head
members, which are set via __init__
.
If I do not define __hash__
, I see the following behaviour:
>>> E = GraphEdge('A', 'B')
>>> H = GraphEdge('A', 'B')
>>> hash(E)
139731804758160
>>> hash(H)
139731804760784
>>> S = set()
>>> S.add(E)
>>> S.add(H)
>>> S
set([('A', 'B'), ('A', 'B')])
The set has no way to know that E
and H
are the same by my definition, since they have differing hashes (which is what the set type uses to determine uniqueness, to my knowledge), so it adds both as distinct elements. So I define a rather naive hash function for GraphEdge
like so:
def __hash__( self ):
return hash( self.tail ) ^ hash( self.head )
Now the above works as expected:
>>> E = GraphEdge('A', 'B')
>>> H = GraphEdge('A', 'B')
>>> hash(E)
409150083
>>> hash(H)
409150083
>>> S = set()
>>> S.add(E)
>>> S.add(H)
>>> S
set([('A', 'B')])
But clearly, ('A', 'B')
and ('B', 'A')
in this case will return the same hash, so I would expect that I could not add ('B', 'A')
to a set already containing ('A', 'B')
. But this is not what happens:
>>> E = GraphEdge('A', 'B')
>>> H = GraphEdge('B', 'A')
>>> hash(E)
409150083
>>> hash(H)
409150083
>>> S = set()
>>> S.add(E)
>>> S.add(H)
>>> S
set([('A', 'B'), ('B', 'A')])
So is the set type using the hash or not? If so, how is the last scenario possible? If not, why doesn't the first scenario (no __hash__
defined) work? Am I missing something?
Edit: For reference for future readers, I already had __eq__
defined (also based on tail
and head
).