tags:

views:

544

answers:

6

Hello,

I need a hash function for a Look Up table, so that if my values are from 0 to N, I need a hash function that give me a value from 0 to n, being n << N. Another piece of information is that I already know N in advance.

I have been investigatinv about different low cost hash functions and I have found only this:

h = z mod n  range(z) - 0 to N, range(h) - 0 to n

My hash function needs to be implemented in HW, so it needs to have a very low cost. Can anyone recommend any other formula or algorithm apart from that simple thing?. When I say HW I mean a truly implementation in HW, and not instructions in a microprocessor.

Thank you.

Update with the solution

Thanks for all the answer, I am not going to select a favourite one, because all of them are equally valid depending on the characteristics of the target application.

A: 

Rewire bits in random order and take lower log2(n) bits

Or just take lower log2(n) bits if your data is evenly distributed.

Quassnoi
+2  A: 

CRC?

There is already a lot of hardware support for this too.

Adam Peck
+4  A: 

The canonical form of that is h(x) = (a*x + b) mod n, where a and b are constants and n is the size of your hash table. You want to make n a prime number, to get optimal(ish) distribution.

Note that this is sensitive to certain kind of distributions -- for example, just doing x mod n is mostly relying on randomness of low-order bits; if they are not random in your set, you will get fairly significant skew.

Bob Jenkins has designed several very good hashing functions; here's one specifically designed to be simple to implement in hardware: http://burtleburtle.net/bob/hash/nandhash.html

For a lot of different hash functions, design discussions, etc, see the rest of the site: http://burtleburtle.net/bob/hash/

SquareCog
Don't you mean "...just doing _x_ mod n is mostly ..." ?
David Schmitt
yes I do, thanks
SquareCog
The b in (a*x+b) mod n won't affect anything, in that things that collide still will, and things that don't still won't.
A. Rex
+2  A: 

I believe this is the best possible hash for this problem (faster than modulo, better distribution), given that all your numbers in 0..N have the same probability:

h = z * n / N;

Where all values are integers, so you have an integer division. This way each value between 0..N is mapped to exactly the same number of values in n.

For example, when n=3 and N=7 (values 3 and 7 not included in the ranges), the hashes are this:

z * n / N = hash
----------------
0 * 3 / 7 = 0
1 * 3 / 7 = 0
2 * 3 / 7 = 0
3 * 3 / 7 = 1
4 * 3 / 7 = 1
5 * 3 / 7 = 2
6 * 3 / 7 = 2

So each hash values are used equally often, just off by 1. Just take care that n*(N-1) does not overflow.

If N is a power of 2, you can replace the division by shifting. e.g. if N=256:

h = (z * n) >> 8;
martinus
+1  A: 

If you're truly talking hardware (vs. software, or hardware implementation of software), and your number of hash buckets n can be written as n = 2m - 1, the easiest is probably a maximum-length linear feedback shift register (LFSR) of which CRC is an instance.

Here's one way you could use an m-bit shift register to create a hash of a data packet (make sure all data is represented consistently as a K-bit string, if you have shorter strings then pad one end with zeros):

  1. Initialize the state of the LFSR (CRC-32 uses all 1's; all zeros is probably bad)
  2. Shift in the bits of your data
  3. (Optional) Shift in an additional j zeros (j between m and 2m is probably a good choice); this adds some additional hashing to reduce direct correlation between input/output bits
  4. Use the contents of the m-bit shift register as your hashed value.
Jason S
A: 

Bob Jenkins's FNV hash.

Mitch Wheat