tags:

views:

171

answers:

5

I have a little problem where need to do a hash of a number of about 10 digits into a number of 6 digits. The hash needs to be deterministic.

It's more important that the hash is not resource intensive.

For example, say that I have some number, x, like 123456789

I want to write an hash function that gives me a number, y, back like 987654.

I'd then like to have a function that takes the x and y as parameters, re-applies the hash on x, and checks that the result is y.

It should be difficult to compute possible input values given the hash.

My first idea of multiplying pairs of digits led to a lot of duplicate hashed values.

I have the feeling that this sort of problem has some kind of elegant solution, but I just can't think of it myself.

Can anyone help me out here? Thanks in advance :)

A: 

How about just discarding the lower 16 bits or last 4 digits?

1234567890 --> 123456

Easily done by just doing an integer division by 10000.

Lasse V. Karlsen
The point of hashing is that it would hard to construct another duplicate easily, and I think that's what the OP mean by encryption. This method doesn't quite help.
notnoop
But then he doesn't say much about what he intends to do with the value, in which case I tend to go for the simplest possible solution that would satisfy the spoken demands. Additionally, considering that for each unique 16-bit output, fully 65536 input-values will map to that output, assuming the mapping of input to output is evenly distributed. So there *will* be duplicates no matter what.
Lasse V. Karlsen
Sorry, I didnt describe my problem well. Have tried adding an example to clarify. I need an the encrypting/hashing to be harder to break than this.
Columbo
How about splitting your 32-bit number in two 16-bit numbers, reversing one of them, and xor'ing the two? In any case, as others have suggested, something like CRC16 would probably work out well.
Lasse V. Karlsen
+7  A: 

What you need is called "hashing".

Try CRC16.

Pavel Radzivilovsky
+2  A: 

(( X>>16) ^ (X)) & 0xFFFF

.......

an0nym0usc0ward
+1  A: 

What you want to do is to try to distribute the hash values as evenly as possible over the range. Some of the built in hashing methods are fairly good at this, so you could perhaps try something like getting the hash code of the string representation, and simply throw away half of the bits:

ushort code = (ushort)value.ToString().GetHashCode();

However, it also depends on what you are going to use the hash code for. The built in hash codes are not intended to be stored permanently. The algorithms for calculating the hash codes can change with any new version of the framework, so if you store the hash codes in the database they may become useless in the future. In that case you would instead have to create the hashing algorithm yourself from scratch, or use some hashing algorithm that was designed for permanent storage.

One simple algorithm that is used for hash codes for some values in the framework is to use exclusive or to make all bits in the value matter when the hash code is smaller than the data:

byte[] b = BitConverter.GetBytes(value);
ushort code = (ushort)(BitConverter.ToUInt16(b, 0) ^ BitConverter.ToUInt16(b, 2));

or the more efficient but less obvious way to do the same:

ushort code = (ushort)((value >> 16) ^ value);

This of course has no obfuscating properties for small values, so you might want to throw in some "random" bits to make the hash code significantly different from the value:

ushort code = (ushort)(0x56D4 ^ (value >> 16) ^ value);
Guffa
Thanks for the code and the explanation! Throwing a secret number into the mix adds a simple extra element of secrecy to the method. I'm going to implement this approach.Also, an interesting point that the built in hashing algorithms in the framework are subject to change.
Columbo
+6  A: 

Your problem as stated is not solvable.

You say that you want the system to be "somewhat hard to break", by which I assume you mean that it is "somewhat hard" for an attacker to take a known digest and produce from it a possible input which hashes to the given digest. Since there are only 4 billion possible inputs and only 65536 possible hashes in the system you propose, it is utterly trivial to find a message that corresponds to a given hash, no matter what the hash algorithm is. On average, the attacker will have about 65000 possible messages to choose from, and can therefore cherry-pick the message that best serves his nefarious scheme.

I would expect a "somewhat hard" problem in the hash-breaking space to require, dedicating, say, a few million dollars worth of supercomputer time to break. Your proposal can be broken by inexperienced high school students writing Javascript programs that take a couple minutes to write and maybe a minute to run, tops; this is not even vaguely close to "somewhat hard".

Why are you choosing such tiny limits on your algorithm, limits which will by their very nature make it trivial to break the hashing? And for that matter, what's the value in hashing such a tiny amount of data as a 32 bit integer?

Eric Lippert