views:

251

answers:

5

From a recent SO question (see http://stackoverflow.com/questions/2671211/create-a-dictionary-in-python-which-is-indexed-by-lists) I realized I probably had a wrong concept of the meaning of hashable and immutable objects in python.

What hashable means in practice?,

What the relation between hashable and immmutable is?

There are mutable objects that are hashable? And immutable not hashable?

+6  A: 

Hashing is the process of converting some large amount of data into a much smaller amount (typically a single integer) in a repeatable way so that it can be looked up in a table in constant-time (O(1)), which is important for high-performance algorithms and data structures.

Immutability is the idea that an object will not change in some important way after it has been created, especially in any way that might change the hash value of that object.

The two ideas are related because objects which are used as hash keys must typically be immutable so their hash value doesn't change. If it was allowed to change then the location of that object in a data structure such as a hashtable would change and then the whole purpose of hashing for efficiency is defeated.

To really grasp the idea you should try to implement your own hashtable in a language like C/C++, or read the Java implementation of the HashMap class.

maerics
+2  A: 

Technically, hashable means that the class defines __hash__(). According to the docs:

__hash__() should return an integer. The only required property is that objects which compare equal have the same hash value; it is advised to somehow mix together (e.g. using exclusive or) the hash values for the components of the object that also play a part in comparison of objects.

I think that for the python builtin types, all hashable types are also immutable. It would be difficult but perhaps not impossible to have a mutable object that nonetheless defined __hash__().

Andrew Jaffe
Do you mean 'all hashable types are also immutable'?
joaquin
@joaquin: oops, fixed!
Andrew Jaffe
It's worth noting that `__hash__` is defined by default to return the object's `id`; you have to go out of your way to set `__hash__ = None` to make it unhashable. Also as Mark Ransom mentions there's an extra condition that it's only hashable if the hash value can never change!
Scott Griffiths
+1  A: 

Hashable means that an variable's value can be represented (or, rather, encoded) by a constant -- string, number, etc. Now, something that is subject to change (mutable) cannot be represented by something that is not. Therefore, any variable that is mutable cannot be hashable and, by the same token, only immutable variables can be hashable.

Hope this helps ...

kaloyan
+1  A: 

In Python they're mostly interchangeable; since the hash is supposed to represent the content, so it's just as mutable as the object, and having an object change the hash value would make it unusable as a dict key.

In other languages, the hash value is more related to the objects 'identity', and not (necessarily) to the value. Thus, for a mutable object, the pointer could be used to start the hashing. Assuming, of course, that an object doesn't move in memory (as some GC do). This is the approach used in Lua, for example. This makes a mutable object usable as a table key; but creates several (unpleasant) surprises for newbies.

In the end, having an immutable sequence type (tuples) makes it nicer for 'multi-value keys'.

Javier
@javier: 'In Python they're mostly interchangeable' My doubts refer to the small part not included in 'mostly'
joaquin
+1  A: 

From the Python Glossary:

An object is hashable if it has a hash value which never changes during its lifetime (it needs a __hash__() method), and can be compared to other objects (it needs an __eq__() or __cmp__() method). Hashable objects which compare equal must have the same hash value.

Hashability makes an object usable as a dictionary key and a set member, because these data structures use the hash value internally.

All of Python’s immutable built-in objects are hashable, while no mutable containers (such as lists or dictionaries) are. Objects which are instances of user-defined classes are hashable by default; they all compare unequal, and their hash value is their id().

Dicts and sets must use a hash for efficient lookup in a hash table; the values must also be immutable, because you can't change the hash once it's put in the table. That's why you often see these requirements mentioned together, because neither one is sufficient and both must be met.

Edit: Reading closely, I see that I glossed over something significant: An object is hashable if it has a hash value which never changes during its lifetime. So by the official definition, anything mutable can't be hashable, even if it has a __hash__() function. My statement about both requirements being necessary is untrue, because being hashable already implies the requirement to be immutable.

Mark Ransom
Good citation. You mean `frozenset`s, right? :)
Andrey Vlasovskikh
@Andrey: frozensets are hashable, sets aren't; both can only contain hashable items. In the places that Mark mentioned sets, he was correct, so I don't think he meant frozensets.
ΤΖΩΤΖΙΟΥ
User defined classes by default define hashable types (the hash is just the object's `id`). This can't change during the object's lifetime, so it is hashable, but that doesn't mean you can't define mutable types! Sorry, but hashability does not imply immutability.
Scott Griffiths