views:

672

answers:

4

Is there a way to generate a hash-like ID in for objects in python that is solely based on the objects' attribute values? For example,

class test:
    def __init__(self, name):
        self.name = name

obj1 = test('a')
obj2 = test('a')

hash1 = magicHash(obj1)
hash2 = magicHash(obj2)

What I'm looking for is something where hash1 == hash2. Does something like this exist in python? I know I can test if obj1.name == obj2.name, but I'm looking for something general I can use on any object.

+4  A: 

You mean something like this? Using the special method __hash__

class test:
     def __init__(self, name):
         self.name = name
     def __hash__(self):
         return hash(self.name)

>>> hash(test(10)) == hash(test(20))
False
>>> hash(test(10)) == hash(test(10))
True
Nadia Alramli
It's not guaranteed to be unique though.
Bastien Léonard
@Bastien, you are right. But that really depends on the application. For many cases hash could be enough.
Nadia Alramli
It is not reccomended to return anything from __hash__(self) besides an int (http://docs.python.org/reference/datamodel.html#object.__hash__) as this will render the object apparently but incorrectly hashable (as in used in dicts)
TokenMacGuy
@TokenMacGuy, thanks for pointing this out. I fixed it
Nadia Alramli
If you have more than one attribute a tuple works too, e.g.: hash((self.first_name, self.last_name))
Matt Good
+2  A: 

Have a lool at the hash() build in function and the __hash__() object method. These may be just what you are looking for. You will have to implement __hash__() for you own classes.

Ber
+3  A: 

To get a unique comparison:

To be unique you could serialize the data and then compare the serialized value to ensure it matches exactly.

Example:

import pickle

class C:
  i = 1
  j = 2

c1 = C()
c2 = C()
c3 = C()
c1.i = 99

unique_hash1 = pickle.dumps(c1) 
unique_hash2 = pickle.dumps(c2) 
unique_hash3 = pickle.dumps(c3) 

unique_hash1 == unique_hash2 #False
unique_hash2 == unique_hash3 #True

If you don't need unique values for each object, but mostly unique:

Note the same value will always reduce to the same hash, but 2 different values could reduce to the same hash.

You cannot use something like the built-in hash() function (unless you override __hash__)

hash(c1) == hash(c2) #False
hash(c2) == hash(c3) #False <--- Wrong

or something like serialize the data using pickle and then use zlib.crc32.

import zlib
crc1 = zlib.crc32(pickle.dumps(c1))
crc2 = zlib.crc32(pickle.dumps(c2))
crc3 = zlib.crc32(pickle.dumps(c3))
crc1 == crc2 #False
crc2 == crc3 #True
Brian R. Bondy
For the unique comparison you could also use zlib.compress to make the representation a little smaller if your objects are very big.
Brian R. Bondy
No, pickle is not good for hashing. The results can vary, as described by Robert Brewer:http://www.aminus.org/blogs/index.php/2007/11/03/pickle_dumps_not_suitable_for_hashing?blog=2
Matt Good
Not sure why but with CPython 2.5.1 I can't reproduce his behavior. It always hashes to the same result for me.
Brian R. Bondy
@Matt Good: If you continue to read the comments on the blog post, you'll see that this problem is with cPickle not pickle. And is due to reference counting.
Brian R. Bondy
A: 

I guess

def hash_attr(ins):
 return hash(tuple(ins.__dict__.items()))

hashes anything instance based on its attributes.

THC4k
As long as all the attributes are hashable..
John Fouhy