views:

168

answers:

6

I would like an algorithm for a function that takes n integers and returns one integer. For small changes in the input, the resulting integer should vary greatly. Even though I've taken a number of courses in math, I have not used that knowledge very much and now I need some help...

An important property of this function should be that if it is used with coordinate pairs as input and the result is plotted (as a grayscale value for example) on an image, any repeating patterns should only be visible if the image is very big.

I have experimented with various algorithms for pseudo-random numbers with little success and finally it struck me that md5 almost meets my criteria, except that it is not for numbers (at least not from what I know). That resulted in something like this Python prototype (for n = 2, it could easily be changed to take a list of integers of course):

import hashlib
def uniqnum(x, y):
    return int(hashlib.md5(str(x) + ',' + str(y)).hexdigest()[-6:], 16)

But obviously it feels wrong to go over strings when both input and output are integers. What would be a good replacement for this implementation (in pseudo-code, python, or whatever language)?

+4  A: 

Hashing fits your requirements perfectly. If you really don't want to use strings, find a Hash library that will take numbers or binary data. But using strings here looks OK to me.

Henk Holterman
A: 

Have a look at this, may be you can be inspired

Chaotic system

In chaotic dynamics, small changes vary results greatly.

Qubeuc
Thanks, but I am looking more for an actual algorithm.
Peter Jaric
+7  A: 

A "hash" is the solution created to solve exactly the problem you are describing. See wikipedia's article

Any hash function you use will be nice; hash functions tend to be judged based on these criteria:

  • The degree to which they prevent collisions (two separate inputs producing the same output) -- a by-product of this is the degree to which the function minimizes outputs that may never be reached from any input.
  • The uniformity the distribution of its outputs given a uniformly distributed set of inputs
  • The degree to which small changes in the input create large changes in the output.

(see perfect hash function)

Given how hard it is to create a hash function that maximizes all of these criteria, why not just use one of the most commonly used and relied-on existing hash functions there already are?

From what it seems, turning integers into strings almost seems like another layer of encryption! (which is good for your purposes, I'd assume)

However, your question asks for hash functions that deal specifically with numbers, so here we go.


Hash functions that work over the integers

If you want to borrow already-existing algorithms, you may want to dabble in pseudo-random number generators

One simple one is the middle square method:

  • Take a digit number
  • Square it
  • Chop off the digits and leave the middle digits with the same length as your original.

ie,

1111 => 01234321 => 2342

so, 1111 would be "hashed" to 2342, in the middle square method.

This way isn't that effective, but for a few number of hashes, this has very low collision rates, a uniform distribution, and great chaos-potential (small changes => big changes). But if you have many values, time to look for something else...

The grand-daddy of all feasibly efficient and simple random number generators is the (Mersenne Twister)[http://en.wikipedia.org/wiki/Mersenne_twister]. In fact, an implementation is probably out there for every programming language imaginable. Your hash "input" is something that will be called a "seed" in their terminology.

In conclusion

  1. Nothing wrong with string-based hash functions
  2. If you want to stick with the integers and be fancy, try using your number as a seed for a pseudo-random number generator.
Justin L.
This is a great answer, *but*: one of my unsuccessful attempts was with the Mersenne Twister, so what I think I need is an actual example implementation, and pseudo code is OK. It's been at least a year (probably more) since I was experimenting with this, so I don't remember exactly what my problem was, though.
Peter Jaric
Oops, I didn't notice that you specifically mentioned trying pseudo-random numbers in your question. I feel sorta dumb now.Anyways, is there any reason you are attempting to write Mersenne from scratch, as opposed to using a pre-existing library? What language are you working in?
Justin L.
see Pete Kirkham's answer to Bob Jenkins' Mix function; it seems like something you could use.
Justin L.
@mstksg: when I look at the code at the wikipedia page I see something that takes one seed and then can generate number after number. Is that trivially convertable to a function fun(n0, n1, ...) -> x?
Peter Jaric
To answer your question above rewriting: I don't need to create it from scratch, but I need it to fit my desired signature.
Peter Jaric
I see two ways of converting it into fun(n_0,n_1...) => x.The first involves definining the pth Twister number of seed n as M(n,p), and then defining your function as M(n_3, M(n_2, M(n_1, M(n_0,0)))), for as many n's as you have (possibly running modulus on each p, to reduce running time).The second involves combining n_0,n_1 into one big seed by either concatenation or multiplication, and finding the first number outputted by the Twister.In reality, there are many ways; if there are trivial, self-suggesting ones, they slip my mind at the moment =(
Justin L.
Note that hash functions work on any input, including integers. In fact, one of the rudimentary cryptography theorems is that hash functions and PRNGs are equivalent (given one you can build the other)!
BlueRaja - Danny Pflughoeft
Hello! I have come back with a better idea than the previous one.It is inspired by [this PRNG](http://en.wikipedia.org/wiki/Linear_congruential_generator); simply pick your two of the constants m, c, and a to stay constant, and let your input be the value of the third one; iterate it for each input.
Justin L.
+3  A: 

Bob Jenkins' mix function is a classic choice, at when n=3.

Pete Kirkham
I've just briefly scanned the text, so: Is it limited to just n = 3? It says something about "The keys are unaligned variable-length byte arrays" which seems to indicate that any n can be used.
Peter Jaric
You can use mix to increase the distance between the results of a simple hashing of lots of numbers, or use it directly on three numbers.
Pete Kirkham
+1  A: 

As others point out, hash functions do exactly what you want. Hashes take bytes - not character strings - and return bytes, and converting between integers and bytes is, of course, simple. Here's an example python function that works on 32 bit integers, and outputs a 32 bit integer:

import hashlib
import struct

def intsha1(ints):
  input = struct.pack('>%di' % len(ints), *ints)
  output = hashlib.sha1(input).digest()
  return struct.unpack('>i', output[:4])

It can, of course, be easily adapted to work with different length inputs and outputs.

Nick Johnson
A: 

A x-bit block cipher will take an number and convert it effectively to another number. You could combine (sum/mult?) your input numbers and cipher them, or iteratively encipher each number - similar to a CBC or chained mode. Google 'format preserving encyption'. It is possible to create a 32-bit block cipher (not widely 'available') and use this to create a 'hashed' output. Main difference between hash and encryption, is that hash is irreversible.

andora