views:

1488

answers:

5

The topic title pretty much says it all. Does anyone know how the built in dictionary type for python is implemented? My understanding is that it is some sort of hash table, but I haven't been able to find any sort of definitive answer.

+3  A: 

It is a hash table. You can read about it some in the python wiki. Otherwise, the code is well-written and should be easy to understand.

Dustin
Ok, so your hash function returns 1857. You will allocate a list with 1857 locations and put your data in last location? I believe not! Hashes are used only for comparison.
You should probably understand how hash tables work before declaring everyone's explanations incorrect: http://en.wikipedia.org/wiki/Hash_table
Dustin
"Hash tables in general exhibit poor locality of reference—that is, the data to be accessed is distributed seemingly at random in memory."hash("apple")=355203333. So, what do you do with this number?
The "Basic Operation" section at the top describes how that works. The locality of reference problem is addressed in the open addressing section. I'd recommend trying to write a hash table. They're pretty easy to write, and that's the easiest way to understand them.
Dustin
"The number is normally converted into the index by taking a modulo, or sometimes bit masking is used where the array size is a power of two."Yes, it works if you know the size of the array a priori. Here you don't!
What a bizarre assertion. Can you show me what line of code you're looking at?
Dustin
hash("apple")=355203333. Length of the array is 128. So index is 355203333%128 = 5. You put 50 000 items in the same dictionary. You think that array doesn't need to be resized?
I don't understand why that concerns you. Are you looking at the latest python source? Do you see line 515 of dictobject.c? These guys have put a lot of thought into it and they've written it down in a few places that are ready for you to read them.
Dustin
@sharpsy: Yes it'll be resized, *if* the insert detects the dict is sufficiently (66%) full. At that point, the array is resized, all existing keys get re-inserted to new positions, and the new key gets inserted into the position for the new size. At the time of insertion, the size is a known value.
Brian
@sharpsy: You sure have a big head for someone whose CS knowledge is severely lacking.
FogleBird
@sharpsy: You can also read how Java's HashMap is implemented if you're actually interested in learning more (it's a good read): http://java.sun.com/j2se/1.4.2/docs/api/java/util/HashMap.html
FogleBird
+7  A: 

It is Open hashing based on a primitive polynomial over Z/2. (Old link)

Please refer Beautiful Code By Andy Oram, Greg Wilson. There is an excellent chapter titled "Python's Dictionary Implementation Being All Things to All People" by Andrew Kuchling.

bhadra
This is a good answer, with very good references, but I vote it down, since it is **incorrect**! Your primary source is confused, the true answer is *open addressing*, the opposite(!), see my answer.
kaizer.se
Thank you for the correction. :-)
bhadra
+4  A: 

Pure Python Dictionary Implementation

For those curious about how CPython's dict implementation works, I've written a Python implementation using the same algorithms.

pantsgolem
+2  A: 

Here's a link to the actual implementation in the python SVN repository. That should be the most definite answer.

David Locke
+4  A: 

Python Dictionaries use Open addressing (reference inside Beautiful code)

NB! Open addressing, a.k.a closed hashing should, as noted in Wikipedia, not be confused with its opposite open hashing! (which we see in the accepted answer).

Open addressing means that the dict uses array slots, and when an object's primary position is taken in the dict, the object's spot is sought at a different index in the same array, using a "perturbation" scheme, where the object's hash value plays part.

kaizer.se