ansaurus

Question

Simple hash function (1 byte output from string input)

Answer 1

+1 A:

Use a CRC-8, which has 9 bits of information, then drop a bit off either end and call it a day. Otherwise use any of the other common CRC algorithms.

whatsisname 2009-12-28 17:32:08

Answer 2

+1 A:

Why not take the most/least significant byte of the standard String hashCode() function ?

Brian Agnew 2009-12-28 17:32:10

That should work as a java solution, but I'm holding out for an XSLT one.

Moose Morals 2009-12-28 17:36:32

Answer 3

+1 A:

Every hash function has its strengths and weaknesses, and fast and easy to compute ones tend to behave badly for certain classes of data. Trial and error needs to be a part of any solution. In addition to the other suggestions, you might try using integer multiplication as part of the hash function, for example

hash = 0
for (int i=0; i<data.length; i++)
    hash = ((37 * hash) + data[i]) & 0xff;

GregS 2009-12-28 17:41:40

Answer 4

A:

My suggestion would be to simply XOR all the bytes in the string. Every bit of every byte will influence the end result, and any single-bit error will definitely cause the hash to differ.

Very simple, very fast. And probably nearly as good as any other solution, given the small number of result bits.

Carl Smotricz 2009-12-28 17:43:27

You'd probably want a bit more than that, as most email addresses are predominantly lowercase ascii with one '@' and one '.'; so you get only about 5 bits of variation rather than 8.

Pete Kirkham 2009-12-28 17:53:20

I don't believe the wildly simplistic premise of the question justifies any more effort than this. Does it really matter if you detect different addresses in 31/32 cases (=96.9%) or 255/256 (=99.6)?

Carl Smotricz 2009-12-28 18:06:10

A possible problem with a simple XOR is that it is unaffected by order, so that "abc" has the same hash as "cba". But the XOR could ultimately work as well or better as something more painful, because as you point it is basically impossible to do very well with an 8 bit output. The XOR method will detect an odd number of bit errors in *any* of the bit positions, a definite plus.

GregS 2009-12-28 18:18:15

At the risk of overengineering this, I would be tempted to compute the lower order 5 bits of the hash as you suggested, and the high-order 3 bits using something like my earlier method but with hash3 = ((5 * hash3) + data[i]) mod 7 instead. This should also detect about 6/7 of the misordered addresses as well.

GregS 2009-12-28 18:30:35

ansaurus

tags:

views:

answers:

Simple hash function (1 byte output from string input)

related questions