tags:

views:

127

answers:

4

Hi all

I am a newbie to the python. Can I unhash, or rather how can I unhash a value. I am using std hash() function. What I would like to do is to first hash a value send it somewhere and then unhash it as such:

#process X
hashedVal = hash(someVal)
#send n receive in process Y
someVal = unhash(hashedVal)
#for example print it
print someVal

Thx in advance

+3  A: 

You can't "unhash" data, hash functions are irreversible due to the pigeonhole principle

http://en.wikipedia.org/wiki/Hash_function
http://en.wikipedia.org/wiki/Pigeonhole_principle

I think what you are looking for encryption/decryption. (Or compression or serialization as mentioned in other answers/comments.)

Andrew Russell
Or compression/decompression.
Felix Kling
A: 

This is not possible in general. A hash function necessarily loses information, and python's hash is no exception.

crazyscot
+8  A: 

It can't be done.

A hash is not a compressed version of the original value, it is a number (or something similar ) derived from the original value. The nature of hash implementations is that it is possible (but statistically unlikely if the hash algorithm is a good one) that two different objects produce the same hash value.

This is known as the Pigeonhole Principle which basically states that if you have N different items, and want to place them into M different categories, where the N number is larger than M (ie. more items than categories), you're going to end up with some categories containing multiple items. Since a hash value is typically much smaller in size than the data it hashes, it follows the same principles.

As such, it is impossible to go back once you have the hash value. You need a different way of transporting data than this.

For instance, an example (but not a very good one) hash algorithm would be to calculate the number modulus 3 (ie. the remainder after dividing by 3). Then you would have the following hash values from numbers:

1 --> 1  <--+- same hash number, but different original values
2 --> 2     |
3 --> 0     |
4 --> 1  <--+

Are you trying to use the hash function in this way in order to:

  • Save space (you have observed that the hash value is much smaller in size than the original data)
  • Secure transportation (you have observed that the hash value is difficult to reverse)
  • Transport data (you have observed that the hash number/string is easier to transport than a complex object hierarchy)

... ?

Knowing why you want to do this might give you a better answer than just "it can't be done".

For instance, for the above 3 different observations, here's a way to do each of them properly:

  • Compression/Decompression, for instance using gzip or zlib (the two typically available in most programming languages/runtimes)
  • Encryption/Decryption, for instance using RSA, AES or a similar secure encryption algorithm
  • Serialization/Deserialization, which is code built to take a complex object hierarchy and produce either a binary or textual representation that later on can be deserialized back into new objects
Lasse V. Karlsen
A: 

Sry I didn't post the answer in the "Add Comment" but there was not enough space

Thx to all for quick reply.

@Lasse V. Karlsen

I am writing a Freenet p2p system in python (student task). And I already have implemented search functionality. In short:

#Node A
sendSearchMsg(hashedKey) #file = {Key : FakeContent}
#Node n (till now I was/am using Int as a key/fileName)
sMsg = receiveMsg(hashedKey)
#search in local files
#if not found calculate next closest Node by key
#and calculating next node is a problem in the sens that I use simple math to do it
#tmpKeys <- filtered [] of already used nodes
#key <- the hashed file name that I am looking for
#filesMap <- {key : nodeId}
for i in tmpKeys:
        tmp.append(fabs(int(i) - key))
nextNodeID = filesMap.get(tmpKeys[tmp.index(min(tmp))])

When I use int as a file name/key there shouldn't be a problem, however when I will use String this won't work. I don't really have an idea how to calculate nextNodeID only using hashed values in the sens that the nextNodeID may be not the correct choice. So what I was thinking about is to "unhash" the key/name and then compare and calculate the nextNodeID. But with this hashed value I can only calculate the closest next hashed value in math sens and I don't think that it will produce the output I want from the function.

Just for the record the teacher has replied that I can use hash(key/int), thus this question/problem is no more (hash(12) == 12 uptill 10 digit key). However just for the "hunger of knowledge" if someone will/can answer I would be gratefull

blah01
Please don't leave answers like this unless they're answers. Instead, edit your question and put the new information at the bottom.
Lasse V. Karlsen