views:

41

answers:

3

I have to store objects that have two attributes (ida and idb) inside a dict. Both attributes are 64 bit positive integers and I can only store one object for a unique arrangement(combination in which the order matters) of ida and idb. For example:

obj1 = SomeClass(ida=5223372036854775807, idb=2)
obj2 = SomeClass(ida=2, idb=5223372036854775807)
obj3 = SomeClass(ida=5223372036854775807, idb=2)

Since the objects themselves are mutable, I'm using hashes of tuples '(ida, idb,)' as keys. Following the example, consider this:

d = {}
# storing obj1
k1 = hash((obj1.ida, obj1.idb,))
d[k1] = obj1
# storing obj2
k2 = hash((obj2.ida, obj2.idb,))
d[k2] = obj2
# storing obj3
k3 = hash((obj3.ida, obj3.idb,)) # note that k3 hash should be the same as k1
d[k3] = obj3
#at this point 'd' should contain only 'obj2' and 'obj3' since 'k1' == 'k3'

I tested the above example in my 64 bit machine and it worked as expected.

I have the following questions:

  • Is it possible for two tuples of big integers to compute the same hash if they have differences either in elements or in ordering?
  • What about in 32 bit platforms?
  • If not, is there a platform independent way to guarantee that the above example will work no matter the values of ida and idb?
A: 

There are only 2**<word size> possible hashes, so you would have to run a 128-bit version of Python to even store all (2**64)**2 possible hashes in the first place. And yes, it could still be possible to have collisions. Use a set if you need to store unique objects; just define __hash__() and __eq__() in a sane manner.

Ignacio Vazquez-Abrams
+2  A: 
In [33]: hash?

Return a hash value for the object. Two objects with the same value have the same hash value. The reverse is not necessarily true, but likely.

Why not just use the tuple (ida,idb) as the key?

import pprint
class SomeClass(object):
    def __init__(self,ida,idb):
        self.ida=ida
        self.idb=idb
obj1 = SomeClass(ida=5223372036854775807, idb=2)
obj2 = SomeClass(ida=2, idb=5223372036854775807)
obj3 = SomeClass(ida=5223372036854775807, idb=2)

d={}
for obj in (obj1,obj2,obj3):
    d[obj.ida,obj.idb]=obj

pprint.pprint(d)
# {(2, 5223372036854775807L): <__main__.SomeClass object at 0xb78839ec>,
   (5223372036854775807L, 2): <__main__.SomeClass object at 0xb7883a0c>}
unutbu
I dont understand how this work, arent 3 tuples created unique? if so, how can the third tuple replace the first(since they are suposed to be different objects) ?
Thiado de Arruda
They have `__hash__()` and `__eq__()` defined properly, so Python knows that they have the same value.
Ignacio Vazquez-Abrams
Ok, now I get it, thanks :)
Thiado de Arruda
+1  A: 

Like Ignacio said, you can have hash collisions. Why don't you just use the tuple itself? Tuples are immutable and it looks like your ida and idb are (immutable) integers.

Nathon