views:

39

answers:

1

In python, how does one split a SHA256 hash into a 32bit prefixes? I'm working with Google's safebrowsing api, which requires that I compare 32bit prefixes between my own collection, and the collection the API sends to me. I understand how to pull the list from Google, and I understand how to form a collection of hashes from parsed URLs, however, I don't understand how I am to derive the first 32bits of each hash.

And after obtaining the prefix, would the best course of action between to place them in a dictionary with corresponding key/value pairs being the prefix/full hash, so that I can reference them later?

+3  A: 

32 bits is the first 4 bytes. So you can slice the byte array.

hash_obj.digest()[:4]

You can take that and use it as a dictionary key.

EDIT

I'm not sure if you need the hex representation, that would be.

hash_obj.hexdigest()[:8]
mikerobi
working like a charm, thank you. is the slice indiciating the number of bits or bytes? how can I tell the difference? IE why is hex :8?
Stev0
@Stev0, with ASCII text, it takes 1 byte (8 bits) to store a character, so taking the first 4 characters gives you 32 bits (4*8). I use the term character loosely, depending on the python version it might actually be a byte array. In hexidecimal notation, each character represents 4 bits, so to represent a byte you need 2 characters, thus you need the first 8 characters. Just to be clear, the hex version is a human friendly representation of the first 32 bits, the non hex form is the first 32 bits. http://en.wikipedia.org/wiki/Hexadecimal
mikerobi